The accompanying drawings illustrate a number of exemplary embodiments and are a part of the specification. Together with the following description, these drawings demonstrate and explain various principles of the present disclosure.
Throughout the drawings, identical reference characters and descriptions indicate similar, but not necessarily identical, elements. While the exemplary embodiments described herein are susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, the exemplary embodiments described herein are not intended to be limited to the particular forms disclosed. Rather, the present disclosure covers all modifications, equivalents, and alternatives falling within the scope of the appended claims.
Augmented Reality (AR) systems, Virtual Reality (VR) systems, and Mixed Reality (MR) systems, collectively referred to as Extended Reality (XR) systems, are a budding segment of today's personal computing systems. XR systems, especially wearable XR systems such as head-mounted XR systems, may be poised to usher in an entirely new era of personal computing by providing users with persistent “always-on” assistance, which may be integrated seamlessly into the users' day-to-day lives without being disruptive. In contrast to more traditional personal computing devices, such as laptops or smartphones, XR devices may be capable of displaying outputs to users in a more accessible, lower-friction manner. For example, some head-mounted XR devices may include displays that are always in users fields of view with which the XR devices may present visual outputs to the users. In some instances, head-mounted XR devices may tightly couple displayed outputs to the users' physical environments (e.g., by placing labels or menus on real-world objects) such that users may not need to look away from their physical environments to consume the displayed outputs.
In contrast to traditional personal computing devices, XR devices often rely on input modalities (e.g., hand gestures or speech) that are cumbersome, ambiguous, lower precision, and/or noisier, which may make accessing the information and/or options provided by traditional XR devices physically and/or cognitively fatiguing and difficult to access and/or navigate. Additionally, in some instances, these input modalities may not always be driven by intentional interactions with the XR devices. For example, a user of an XR device may point for emphasis during conversation but not intend the pointing to indicate a targeting or selection input for the XR device. Similarly, a user may say a word or phrase associated with a voice command of an XR device during conversation without intending to trigger the XR device to perform an action associated with the voice command.
Unlike traditional personal computing devices, XR devices often have interaction environments that are unknown, less known, or not prespecified, which may cause some XR systems to consume considerable amounts of computing resources to discover objects within such environments with which users of the XR devices may interact. If users have no immediate intentions to interact with the objects in their environments, any resources consumed in discovering the objects and/or user interactions may be wasted. Additionally, if an XR device is capable of presenting information about and/or options for interacting with objects in the users' environment, users may be distracted or annoyed by the information and/or options whenever the users have no immediate intentions to interact with the objects in their environments.
The present disclosure is generally directed to systems and methods for using biosignals (e.g., eye-tracking data or other biosignals indicative of gaze dynamics, such as pupil dynamics) to anticipate and signal, in real time, the temporal onset of a user's intent to interact with the disclosed systems. In some embodiments, the disclosed systems may anticipate when a user intends to interact (e.g., a user's intention to perform a selection or a user's intention to provide user input) and/or may intelligently facilitate the user's interaction or input in a way that reduces the physical and cognitive burden on the user (e.g., via adaptive and/or predictive interfaces). By anticipating the timing of a user's intent to interact, the systems and methods disclosed herein may responsively drive ultra-low-friction predictive interfaces to avoid overburdening the user with all of the potential actions or user-interface elements available to the user. In some embodiments, the disclosed systems and methods may generate signals indicating the timing of a user's intent to interact with a computing system that may allow intelligent facilitation systems to provide adaptive interventions at just the right time.
Some embodiments of the present disclosure may predict the onset of a user's intent to interact without first gathering or relying on knowledge of the user's environment and/or the user's gaze point in that environment. In some embodiments, the disclosed systems may refrain from gathering knowledge of the user's environment and/or the user's gaze point in that environment in order to discover objects within the environment with which the user may interact until after the onset of a user's intent to interact is detected, which may conserve system resources during periods of time when the user does not intent to interact with the disclosed systems.
Features from any of the embodiments described herein may be used in combination with one another in accordance with the general principles described herein. These and other embodiments, features, and advantages will be more fully understood upon reading the following detailed description in conjunction with the accompanying drawings and claims.
The following will provide, with reference to
In some embodiments, example system 100 may enable a user to interact with various types and forms of objects. For example, example system 100 may include one or more user interfaces (e.g., user interface(s) 111) with which a user may interact with objects associated with example system 100. In some examples, example system 100 may enable a user to use example system 100 to interact with physical objects (e.g., a television, a light, a smart device, an Internet-Of-Things (IOT) device, etc.) in the user's environment. In some examples, example system 100 may present virtual objects to a user with which the user may use example system 100 to interact. In some examples, example system 100 may present a menu of options or commands (e.g., as part of a graphical user interface) that the user may interact with in order to control example system 100 and/or other physical or virtual objects that are made interactable by, presented by, or otherwise associated with example system 100. In some embodiments, a physical object may be considered to be associated with example system 100 if example system 100 enables a user to interact with the physical object. In some embodiments, a virtual object may be considered to be associated with example system 100 if example system 100 presents (e.g., visually presents) the object to the user.
As illustrated in
As illustrated in
In some embodiments, one or more of targeting subsystem(s) 101 and/or one or more of interaction subsystem(s) 103 may represent or collectively form all or a portion of a user-input subsystem, such as a point-and-click or a point-and-select user-input system, of example system 100.
Returning to
Returning to
In some embodiments, environmental sensor(s) 109 may represent or include one or more sensing devices capable of generating real-time signals indicative of one or more characteristics of users' environments. In some embodiments, environmental sensor(s) 109 may collect, receive, and/or identify data that indicates, either directly or indirectly, objects within a user's environment with which a user may interact. Examples of environmental sensor(s) 109 include, without limitation, cameras, microphones, Simultaneous Localization and Mapping (SLAM) sensors, Radio-Frequency Identification (RFID) sensors, variations or combinations of one or more of the same, or any other type or form of environment-sensing or object-sensing device or system.
As further illustrated in
Intent-predicting model(s) 140 may represent or include any machine-learning model, algorithm, heuristic, data, or combination thereof, that may anticipate, recognize, detect, estimate, predict, label, infer, and/or react to the temporal onset of a user's intent to interact with example system 100 based on and/or using biosignals acquired from one or more biosensors, such as biosensors 107. Examples of intent-predicting model(s) 140 include, without limitation, decision trees (e.g., boosting decision trees), neural networks (e.g., a deep convolutional neural network), deep-learning models, support vector machines, linear classifiers, non-linear classifiers, perceptrons, naive Bayes classifiers, any other machine-learning or classification techniques or algorithms, or any combination thereof.
The systems describe herein may train intent-to-interact models, such as intent-predicting model 140, to predict the timing of user interactions in any suitable way. In one example, the systems may train an intent-to-interact model to predict when a user is starting to and/or about to perform an interaction using a ground-truth time series of physiological data that includes physiological data recorded before and/or up to the interaction. In some examples, the time series may include samples preceding a user's interactions by approximately 10 ms, 50 ms, 100 ms, 200 ms, 300 ms, 400 ms, 500 ms, 600 ms, 700 ms, 800 ms, 900 ms, 1000 ms, 1100 ms, 1200 ms, 1300 ms, 1400 ms, 1500 ms, 1600 ms, 1700 ms, 1800 ms, 1900 ms, or 2000 ms. Additionally or alternatively, the time series include samples preceding a user's interactions by approximately 2100 ms, 2200 ms, 2300 ms, 2400 ms, 2500 ms, 2600 ms, 2700 ms, 2800 ms, 2900 ms, 3000 ms, 3100 ms, 3200 ms, 3300 ms, 3400 ms, 3500 ms, 3600 ms, 3700 ms, 3800 ms, 3900 ms, 4000 ms, 4100 ms, 4200 ms, 4300 ms, 4400 ms, 4500 ms, 4600 ms, 4700 ms, 4800 ms, 4900 ms, 5000 ms, 5100 ms, 5200 ms, 5300 ms, 5400 ms, 5500 ms, 5600 ms, 5700 ms, 5800 ms, 5900 ms, 6000 ms, 6100 ms, 6200 ms, 6300 ms, 6400 ms, 6500 ms, 6600 ms, 6700 ms, 6800 ms, 6900 ms, 7000 ms, 7100 ms, 7200 ms, 7300 ms, 7400 ms, 7500 ms, 7600 ms, 7700 ms, 7800 ms, 7900 ms, 8000 ms, 8100 ms, 8200 ms, 8300 ms, 8400 ms, 8500 ms, 8600 ms, 8700 ms, 8800 ms, 8900 ms, 9000 ms, 9100 ms, 9200 ms, 9300 ms, 9400 ms, 9500 ms, 9600 ms, 9700 ms, 9800 ms, 9900 ms, 10000 ms, 10100 ms, 10200 ms, 10300 ms, 10400 ms, 10500 ms, 10600 ms, 10700 ms, 10800 ms, or 10900 ms. In some embodiments, an intent-to-interact model may take as input a similar time series of physiological data.
In some embodiments, the disclosed systems may use one or more intent-predicting models (e.g., an intent-predicting model trained for an individual user or an intent-predicting model trained for a group of users). In at least one embodiment, the disclosed systems may train intent-to-interact models to make predictions for interaction intents that are on the scale of milliseconds or seconds.
As further illustrated in
As further illustrated in
System 100 in
As shown in
As illustrated in
The systems described herein may perform step 510 in a variety of ways.
θ=2×atan 2(|u−v|, |u+v|) (1)
where consecutive samples of gaze vectors 706 are represented as normalized vectors u and v and the corresponding angular displacement is represented as θ.
The disclosed systems may then calculate gaze velocities 714 from angular displacements 710 using a suitable gaze-velocity calculation 712. For example, the disclosed systems may divide each sample from angular displacements 710 (e.g., θ, as calculated above) by the change in time between associated consecutive samples from gaze vectors 706.
In some embodiments, the disclosed systems may perform one or more filtering operation(s) 716 on gaze velocities 712 (e.g., to remove noise and/or unwanted segments before downstream event detection and feature extraction). In at least one embodiment, the disclosed systems may remove all samples where gaze velocity exceeds about 800 degrees/second, which may indicate unfeasibly fast eye movements. The disclosed systems may then replace removed values through interpolation. Additionally or alternatively, the disclosed systems may apply a median filter (e.g., a median filter with a width of seven samples) to gaze velocities 714 to smooth the signal and/or account for noise.
In some embodiments, the disclosed systems may generate gaze events 722 from gaze velocities 714 by performing one or more event-detection operation(s) 718. In some embodiments, the disclosed systems may detect fixation events (e.g., moments of maintaining visual gaze on a single location) and/or saccade events (e.g., moments of rapid eye movement between points of fixation) from gaze velocities 714 using any suitable detection model, algorithm, or heuristic. For example, the disclosed systems may perform saccade detection using a suitable saccade detection algorithm (e.g., Velocity-Threshold Identification (I-VT), Dispersion-Threshold Identification (I-DT), or Hidden Markov Model Identification (I-HMM)). In at least one embodiment, the disclosed systems may perform I-VT saccade detection by identifying consecutive samples from gaze velocities 714 that exceeded about 70 degrees/second. In some embodiments, the disclosed systems may require a minimum duration in the range of about 5 milliseconds to about 30 milliseconds (e.g., 17 milliseconds) and a maximum duration in the range of about 100 milliseconds to about 300 milliseconds (e.g., 200 milliseconds) for saccade events. In some embodiments, the disclosed systems may perform I-DT fixation detection by computing dispersion (e.g., the largest angular displacement from the centroid of gaze samples) over predetermined time windows and marking time windows where dispersion did not exceed about 1 degree as fixation events. In some embodiments, the disclosed systems may require a minimum duration in the range of about 50 milliseconds to about 200 milliseconds (e.g., 100 milliseconds) and a maximum duration in the range of about 0.5 seconds to about 3 seconds (e.g., 2 seconds) for fixation events.
In some embodiments, the disclosed systems may generate gaze features 724 by performing one or more event-extraction operation(s) 720 on gaze vectors 702, gaze vectors 706, angular displacements 710, gaze velocities 714, and/or any other suitable eye-tracking data. The disclosed systems may extract a variety of gaze-based features for use in predicting the onset of a user's intent to interact with a computing system. Examples of gaze-based features include, without limitation, gaze velocity (e.g., a measure of how fast gaze is moving), ambient attention, focal attention, saccade dynamics, gaze features that characterize visual attention, dispersion (e.g., a measure of how spread out gaze points are over a period of time), event-detection labels, low-level eye movement features derived from gaze events 722, the K coefficient (e.g., to discern between focal and ambient behavior), pupil dynamics (e.g., dynamics relating to and/or involving pupil diameter, pupil area, pupil ellipsoid axis (major and minor) lengths, and/or iris radius), variations or combinations of one or more of the same, or any other type or form of eye-tracking data.
The systems described herein may predict when a user intends to interact using a variety of gaze data and gaze dynamics. For example, the disclosed systems may predict moments of interaction using a combination of gaze velocity, low-level features from fixation and saccade events, and/or mid-level features that recognize patterns in the shape of scan paths. In some embodiments, the systems described herein may predict a user's intent based on patterns and/or elements of one or more of fixation events (e.g., whether or not a user is fixated on something), gaze velocity, fixation average velocity, saccade acceleration skew in the x direction, saccade standard deviation in the y direction, saccade velocity kurtosis, saccade velocity skew, saccade velocity skew in the y direction, saccade duration, ambient/focal K coefficient, saccade velocity standard deviation, saccade distance from previous saccade, dispersion, fixation duration, fixation kurtosis in the y direction, saccade velocity kurtosis in the x direction, saccade velocity skew in the x direction, saccade amplitude, saccade standard deviation in the x direction, fixation kurtosis in the x direction, saccade acceleration kurtosis in the y direction, saccade acceleration skew, fixation skew in the y direction, saccade acceleration kurtosis in the x direction, saccade events (e.g., whether or not a user is performing a saccade), saccade dispersion, fixation standard deviation in the x direction, fixation skew in the x direction, saccade velocity mean, fixation standard deviation in the y direction, saccade velocity kurtosis in the y direction, fixation angle from previous fixation, saccade angle from previous saccade, saccade velocity median in the x direction, fixation path length, saccade acceleration skew in the y direction, fixation dispersion, saccade acceleration kurtosis, saccade path length, saccade acceleration median in the y direction, saccade velocity mean in the x direction, saccade acceleration median in the y direction, saccade velocity mean in the x direction, saccade acceleration standard deviation in the x direction, saccade velocity mean in the y direction, saccade acceleration mean, saccade acceleration mean in the x direction, saccade acceleration median in the x direction, saccade acceleration standard deviation, saccade acceleration standard deviation in the y direction, saccade velocity standard deviation in the y direction, saccade acceleration maximum in the x direction, saccade velocity median, saccade velocity maximum in the x direction, saccade acceleration maximum, saccade acceleration median, saccade velocity median in the y direction, saccade acceleration mean in the y direction, saccade ratio, saccade velocity standard deviation in the x direction. Additionally or alternatively, the systems described herein may predict a user's intent based on gaze velocity, any suitable measure of ambient/focal attention, statistical features of saccadic eye movements, blink patterns, scan path patterns, and/or changes to pupil features.
Returning to
At step 530 one or more of the systems described herein may provide an intent-to-interact signal indicating the intent of the user to interact to an intelligent-facilitation subsystem in response to the intent of the user to interact. For example, signaling module 108 may, as part of wearable device 402 in
As illustrated in
As illustrated in
In some embodiments, the disclose systems may use biosensors rather than environmental sensors to monitor physical attributes of a user that are environment agnostic, physical attributes of the user that are unrelated to the user's environment, and/or physical attributes of the user that are unrelated to an XR environment with which the user interacts. In some examples, the systems disclosed herein may monitor physical attributes of a user via any of physiological sensors 1000(1)-(N) in
As illustrated in
As illustrated in
As illustrated in
In some embodiments, the disclosed systems may notify a user-input model (e.g., fusion algorithm 1030 in
In some embodiments, the disclosed systems may, in response to an indication of a user's intent to interact with an extended-reality environment, display an interface element of a predictive interface to the user before the user interacts with the extended-reality environment. As shown in
In some embodiments, the disclosed system may optimize an XR environment in response to an indication of a user's intent to interact with the XR environment. As shown in
As described above, the disclosed systems may use gaze data collected from an eye tracker as a rich source of clues for both what a user intends to interact with and when. In some embodiments, the disclosed systems may monitor natural gaze behavior in a transparent and unobtrusive manner. In some embodiments, the disclosed systems may use models that predict a user's intent to interact from eye movements to drive predictive XR interfaces that provide users with easy-to-use, minimally fatiguing XR interactions for all-day use.
Example 1: A computer-implemented method may include (1) acquiring, via one or more biosensors, one or more biosignals generated by a user of a computing system, (2) using the one or more biosignals to anticipate an intent of the user to interact with the computing system, and (3) providing an intent-to-interact signal indicating the intent of the user to interact to an intelligent-facilitation subsystem. In some examples, the computing system may include (1) at least one targeting subsystem that enables the user to explicitly target, for interaction, one or more objects, (2) at least one interaction subsystem that enables the user to interact with, when targeted, one or more of the objects, and (3) an intelligent-facilitation subsystem that targets one or more of the objects on behalf of the user in response to intent-to-interact signals.
Example 2: The computer-implemented method of Example 1 further including (1) identifying, by the intelligent-facilitation subsystem, at least one of the objects as being most likely to be interacted with by the user in response to receiving the intent-to-interact signal indicating the intent of the user to interact, (2) targeting, by the intelligent-facilitation subsystem, the at least one of the objects on behalf of the user, (3) receiving, from the user via the interaction subsystem, a request to interact with the at least one of the objects targeted by the intelligent-facilitation subsystem on behalf of the user, and (4) performing an operation in response to receiving the request to interact with the at least one of the objects.
Example 3: The computer-implemented method of any of Examples 1-2, wherein the intelligent-facilitation subsystem refrains from identifying the at least one of the objects until after receiving the intent-to-interact signal.
Example 4: The computer-implemented method of any of Examples 1-3 where (1) the one or more biosensors include one or more eye-tracking sensors, (2) the one or more biosignals include signals indicative of gaze dynamics of the user, and (3) the signals indicative of gaze dynamics of the user are used to anticipate the intent of the user to interact.
Example 5: The computer-implemented method of any of Examples 1-4 where the signals indicative of gaze dynamics of the user include a measure of gaze velocity.
Example 6: The computer-implemented method of any of Examples 1-5 where the signals indicative of gaze dynamics of the user include at least one of (1) a measure of ambient attention and/or (2) a measure of focal attention.
Example 7: The computer-implemented method of any of Examples 1-6 wherein the signals indicative of gaze dynamics of the user include a measure of saccade dynamics.
Example 8: The computer-implemented method of any of Examples 1-7 where (1) the one or more biosensors include one or more hand-tracking sensors, (2) the one or more biosignals include signals indicative of hand dynamics of the user, and (3) the signals indicative of hand dynamics of the user are used to anticipate the intent of the user to interact.
Example 9: The computer-implemented method of any of Examples 1-8 where (1) the one or more biosensors include one or more neuromuscular sensors, (2) the one or more biosignals include neuromuscular signals obtained from the user's body, and (3) the neuromuscular signals obtained from the user's body are used to anticipate the intent of the user to interact.
Example 10: The computer-implemented method of any of Examples 1-9 where the objects associated with the computing system include one or more physical objects from a real-world environment of the user.
Example 11: The computer-implemented method of any of Examples 1-10 where (1) the computing system is an extended-reality system, (2) the computer-implemented method further includes displaying, by the extended-reality system, virtual objects to the user, and (3) the objects associated with the computing system include the virtual objects.
Example 12: The computer-implemented method of any of Examples 1-11 where (2) the computing system includes an extended-reality system, (2) the computer-implemented method further includes displaying, by the extended-reality system, a menu to the user, and (3) the objects associated with the computing system include visual elements of the menu.
Example 13: The computer-implemented method of any of Examples 1-12 further including training a predictive model to output the intent-to-interact signals.
Example 14: A system may include (1) at least one targeting subsystem adapted to enable a user to explicitly target one or more objects for interaction, (2) at least one interaction subsystem adapted to enable the user to interact with, when targeted, one or more of the objects, (3) an intelligent-facilitation subsystem adapted to target the objects on behalf of the user in response to intent-to-interact signals, (3) one or more biosensors adapted to detect biosignals generated by the user, (4) at least one physical processor, and (5) physical memory including computer-executable instructions that, when executed by the physical processor, cause the physical processor to (a) acquire, via the one or more biosensors, the one or more biosignals generated by the user,(b) use the one or more biosignals to anticipate an intent of the user to interact with the system, and (c) provide an intent-to-interact signal indicating the intent of the user to interact with the system to the intelligent-facilitation subsystem in response to the intent of the user to interact with the system.
Example 15: The system of Example 14, where (1) the one or more biosensors include one or more eye-tracking sensors adapted to measure gaze dynamics of the user, (2) the one or more biosignals include signals indicative of the gaze dynamics of the user, and (3) the gaze dynamics of the user are used to anticipate the intent of the user to interact with the system.
Example 16: The system of any of Examples 14-15, where (1) the one or more biosensors include one or more hand-tracking sensors, (2) the one or more biosignals include signals indicative of hand dynamics of the user, and (3) the signals indicative of hand dynamics of the user are used to anticipate the intent of the user to interact with the computing system.
Example 17: The system of any of Examples 14-16, where (1) the one or more biosensors include one or more neuromuscular sensors, (2) the one or more biosignals include neuromuscular signals obtained from the user's body, and (3) the neuromuscular signals obtained from the user's body are used to anticipate the intent of the user to interact with the computing system.
Example 18: The system of any of Examples 14-17, where (1) the at least one targeting subsystem includes a pointing subsystem of a physical controller and (2) the at least one interaction subsystem includes a selecting subsystem of the physical controller.
Example 19: The system of any of Examples 14-18, where (1) the intelligent-facilitation subsystem is further adapted to (a) identify at least one of the objects as being most likely to be interacted with by the user in response to receiving the intent-to-interact signal indicating the intent of the user to interact with the computing system and (b) target the at least one of the objects on behalf of the user and (2) the physical memory further includes additional computer-executable instructions that, when executed by the physical processor, cause the physical processor to (a) receive, from the user via the interaction subsystem, a request to interact with the at least one of the objects targeted by the intelligent-facilitation subsystem and (b) perform an operation in response to receiving the request to interact with the at least one of the objects.
Example 20: A non-transitory computer-readable medium may include one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to (1) acquire, via one or more biosensors, one or more biosignals generated by a user of the computing device, (2) use the one or more biosignals to anticipate an intent of the user to interact with the objects using the computing device, and (3) provide an intent-to-interact signal indicating an intent of the user to interact with the computing device to the intelligent-facilitation subsystem in response to the intent of the user to interact with the computing device. In some examples, the computing device may include (1) at least one targeting subsystem that enables the user to explicitly target, for interaction, one or more of the objects, (2) at least one interaction subsystem that enables the user to interact with, when targeted, one or more of the objects, and (3) an intelligent-facilitation subsystem that targets one or more of the objects on behalf of the user in response to intent-to-interact signals.
Example 21: A computer-implemented method for predicting an intent to interact may include (1) monitoring, via one or more sensors, one or more physical attributes of a user, (2) providing, as input, the one or more physical attributes of the user to a model trained to detect when the user intends to interact with an extended-reality environment, (3) receiving, as output from the model, an indication of the user's intent to interact with the extended-reality environment, and (4) performing, in response to the indication, an extended-reality operation before the user interacts with the extended-reality environment.
Example 22: The computer-implemented method of any of Examples 1-13 or 21, wherein (1) the one or more sensors comprise one or more eye-tracking sensors and (2) monitoring the one or more physical attributes of the user may include monitoring one or more gaze attributes of the user.
Example 23: The computer-implemented method of any of Examples 1-13, 21, and/or 22, wherein the one or more gaze attributes of the user include one or more of a fixation attribute, a gaze velocity attribute, a gaze acceleration attribute, and/or a saccade attribute.
Example 24: The computer-implemented method of any of Examples 1-13 and/or 21-23, wherein monitoring the one or more physical attributes of the user may include monitoring one or more neuromuscular attributes of the user.
Example 25: The computer-implemented method of any of Examples 1-13 and/or 21-24, wherein performing the extended-reality operation may include notifying an interaction model of the user's intent to interact with the extended-reality environment before the user interacts with the extended-reality environment.
Example 26: The computer-implemented method of any of Examples 1-13 and/or 1-5, wherein performing the extended-reality operation may include displaying an interface element to the user before the user interacts with the extended-reality environment.
Example 27: The computer-implemented method of any of Examples 1-13 and/or 21-26, wherein performing the extended-reality operation may include displaying an interface element for interacting with an object in the extended-reality environment to the user before the user interacts with the object in the extended-reality environment.
Example 28: The computer-implemented method of any of Examples 1-13 and/or 21-27, wherein performing the extended-reality operation may include (1) identifying, in response to the indication, an object in the extended-reality environment with which the user is most likely to interact and (2) displaying an interface element for interacting with the object in the extended-reality environment.
Example 29: The computer-implemented method of any of Examples 1-13 and/or 21-28, wherein performing the extended-reality operation may include loading, into memory, at least one asset most likely to be interacted with by the user before the user interacts with the at least one asset.
Example 30: The computer-implemented method of any of Examples 1-13 and/or 21-29, wherein (1) the indication of the user's intent may include a prediction that the user will perform a pinch gesture to interact with the extended-reality environment and (2) the extended-reality operation is performed before the user completes the pinch gesture.
Example 31: A system may include (1) at least one physical processor and (2) physical memory comprising computer-executable instructions that, when executed by the physical processor, cause the physical processor to (a) monitor, via one or more sensors, one or more physical attributes of a user, (b) provide, as input, the one or more physical attributes of the user to a model trained to detect when the user intends to interact with an extended-reality environment, (c) receive, as output from the model, an indication of the user's intent to interact with the extended-reality environment, and (d) perform, in response to the indication, an extended-reality operation before the user interacts with the extended-reality environment.
Example 32: The system of any of Examples 14-19 and/or 31, wherein (1) the one or more sensors comprise one or more eye-tracking sensors and (2) monitoring the one or more physical attributes of the user may include monitoring one or more gaze attributes of the user.
Example 33: The system of any of Examples 14-19, 31, and/or 32, wherein the one or more gaze attributes of the user may include one or more of a fixation attribute, a gaze velocity attribute, a gaze acceleration attribute, or a saccade attribute.
Example 34: The system of any of Examples 14-19 and/or 31-33, wherein monitoring the one or more physical attributes of the user may include monitoring one or more neuromuscular attributes of the user.
Example 35: The system of any of Examples 14-19 and/or 31-34, wherein performing the extended-reality operation may include notifying an interaction model of the user's intent to interact with the extended-reality environment before the user interacts with the extended-reality environment.
Example 36: The system of any of Examples 14-19 and/or 31-35, wherein performing the extended-reality operation may include displaying an interface element to the user before the user interacts with the extended-reality environment.
Example 37: The system of any of Examples 14-19 and/or 31-36, wherein performing the extended-reality operation may include displaying an interface element for interacting with an object in the extended-reality environment to the user before the user interacts with the object in the extended-reality environment.
Example 38: The system of any of Examples 31-37, wherein performing the extended-reality operation may include (1) identifying, in response to the indication, an object in the extended-reality environment with which the user is most likely to interact and (2) displaying an interface element for interacting with the object in the extended-reality environment.
Example 39: The system of any of Examples 14-19 and/or 31-38, wherein performing the extended-reality operation may include loading, into memory, at least one asset most likely to be interacted with by the user before the user interacts with the at least one asset.
Example 40: The system of any of Examples 14-19 and/or 31-39, wherein (1) the indication of the user's intent may include a prediction that the user will perform a pinch gesture to interact with the extended-reality environment and (2) the extended-reality operation is performed before the user completes the pinch gesture.
Example 41: A non-transitory computer-readable medium may include one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to (1) monitor, via one or more sensors, one or more physical attributes of a user, (2) provide, as input, the one or more physical attributes of the user to a model trained to detect when the user intends to interact with an extended-reality environment, (3) receive, as output from the model, an indication of the user's intent to interact with the extended-reality environment, and (4) perform, in response to the indication, an extended-reality operation before the user interacts with the extended-reality environment.
Embodiments of the present disclosure may include or be implemented in conjunction with various types of artificial-reality systems. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, for example, a virtual reality, an augmented reality, a mixed reality, a hybrid reality, or some combination and/or derivative thereof. Artificial-reality content may include completely computer-generated content or computer-generated content combined with captured (e.g., real-world) content. The artificial-reality content may include video, audio, haptic feedback, or some combination thereof, any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional (3D) effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, for example, create content in an artificial reality and/or are otherwise used in (e.g., to perform activities in) an artificial reality.
Artificial-reality systems may be implemented in a variety of different form factors and configurations. Some artificial-reality systems may be designed to work without near-eye displays (NEDs). Other artificial-reality systems may include an NED that also provides visibility into the real world (such as, e.g., augmented-reality system 1300 in
Turning to
In some embodiments, augmented-reality system 1300 may include one or more sensors, such as sensor 1340. Sensor 1340 may generate measurement signals in response to motion of augmented-reality system 1300 and may be located on substantially any portion of frame 1310. Sensor 1340 may represent one or more of a variety of different sensing mechanisms, such as a position sensor, an inertial measurement unit (IMU), a depth camera assembly, a structured light emitter and/or detector, or any combination thereof. In some embodiments, augmented-reality system 1300 may or may not include sensor 1340 or may include more than one sensor. In embodiments in which sensor 1340 includes an IMU, the IMU may generate calibration data based on measurement signals from sensor 1340. Examples of sensor 1340 may include, without limitation, accelerometers, gyroscopes, magnetometers, other suitable types of sensors that detect motion, sensors used for error correction of the IMU, or some combination thereof.
In some examples, augmented-reality system 1300 may also include a microphone array with a plurality of acoustic transducers 1320(A)-1320(J), referred to collectively as acoustic transducers 1320. Acoustic transducers 1320 may represent transducers that detect air pressure variations induced by sound waves. Each acoustic transducer 1320 may be configured to detect sound and convert the detected sound into an electronic format (e.g., an analog or digital format). The microphone array in
In some embodiments, one or more of acoustic transducers 1320(A)-(J) may be used as output transducers (e.g., speakers). For example, acoustic transducers 1320(A) and/or 1320(B) may be earbuds or any other suitable type of headphone or speaker.
The configuration of acoustic transducers 1320 of the microphone array may vary. While augmented-reality system 1300 is shown in
Acoustic transducers 1320(A) and 1320(B) may be positioned on different parts of the user's ear, such as behind the pinna, behind the tragus, and/or within the auricle or fossa. Or, there may be additional acoustic transducers 1320 on or surrounding the ear in addition to acoustic transducers 1320 inside the ear canal. Having an acoustic transducer 1320 positioned next to an ear canal of a user may enable the microphone array to collect information on how sounds arrive at the ear canal. By positioning at least two of acoustic transducers 1320 on either side of a user's head (e.g., as binaural microphones), augmented-reality device 1300 may simulate binaural hearing and capture a 3D stereo sound field around about a user's head. In some embodiments, acoustic transducers 1320(A) and 1320(B) may be connected to augmented-reality system 1300 via a wired connection 1330, and in other embodiments acoustic transducers 1320(A) and 1320(B) may be connected to augmented-reality system 1300 via a wireless connection (e.g., a BLUETOOTH connection). In still other embodiments, acoustic transducers 1320(A) and 1320(B) may not be used at all in conjunction with augmented-reality system 1300.
Acoustic transducers 1320 on frame 1310 may be positioned in a variety of different ways, including along the length of the temples, across the bridge, above or below display devices 1315(A) and 1315(B), or some combination thereof. Acoustic transducers 1320 may also be oriented such that the microphone array is able to detect sounds in a wide range of directions surrounding the user wearing the augmented-reality system 1300. In some embodiments, an optimization process may be performed during manufacturing of augmented-reality system 1300 to determine relative positioning of each acoustic transducer 1320 in the microphone array.
In some examples, augmented-reality system 1300 may include or be connected to an external device (e.g., a paired device), such as neckband 135. Neckband 135 generally represents any type or form of paired device. Thus, the following discussion of neckband 135 may also apply to various other paired devices, such as charging cases, smart watches, smart phones, wrist bands, other wearable devices, hand-held controllers, tablet computers, laptop computers, other external compute devices, etc.
As shown, neckband 135 may be coupled to eyewear device 1302 via one or more connectors. The connectors may be wired or wireless and may include electrical and/or non-electrical (e.g., structural) components. In some cases, eyewear device 1302 and neckband 135 may operate independently without any wired or wireless connection between them. While
Pairing external devices, such as neckband 135, with augmented-reality eyewear devices may enable the eyewear devices to achieve the form factor of a pair of glasses while still providing sufficient battery and computation power for expanded capabilities. Some or all of the battery power, computational resources, and/or additional features of augmented-reality system 1300 may be provided by a paired device or shared between a paired device and an eyewear device, thus reducing the weight, heat profile, and form factor of the eyewear device overall while still retaining desired functionality. For example, neckband 135 may allow components that would otherwise be included on an eyewear device to be included in neckband 135 since users may tolerate a heavier weight load on their shoulders than they would tolerate on their heads. Neckband 135 may also have a larger surface area over which to diffuse and disperse heat to the ambient environment. Thus, neckband 135 may allow for greater battery and computation capacity than might otherwise have been possible on a stand-alone eyewear device. Since weight carried in neckband 135 may be less invasive to a user than weight carried in eyewear device 1302, a user may tolerate wearing a lighter eyewear device and carrying or wearing the paired device for greater lengths of time than a user would tolerate wearing a heavy standalone eyewear device, thereby enabling users to more fully incorporate artificial-reality environments into their day-to-day activities.
Neckband 135 may be communicatively coupled with eyewear device 1302 and/or to other devices. These other devices may provide certain functions (e.g., tracking, localizing, depth mapping, processing, storage, etc.) to augmented-reality system 1300. In the embodiment of
Acoustic transducers 1320(I) and 1320(J) of neckband 135 may be configured to detect sound and convert the detected sound into an electronic format (analog or digital). In the embodiment of
Controller 1325 of neckband 135 may process information generated by the sensors on neckband 135 and/or augmented-reality system 1300. For example, controller 1325 may process information from the microphone array that describes sounds detected by the microphone array. For each detected sound, controller 1325 may perform a direction-of-arrival (DOA) estimation to estimate a direction from which the detected sound arrived at the microphone array. As the microphone array detects sounds, controller 1325 may populate an audio data set with the information. In embodiments in which augmented-reality system 1300 includes an inertial measurement unit, controller 1325 may compute all inertial and spatial calculations from the IMU located on eyewear device 1302. A connector may convey information between augmented-reality system 1300 and neckband 135 and between augmented-reality system 1300 and controller 1325. The information may be in the form of optical data, electrical data, wireless data, or any other transmittable data form. Moving the processing of information generated by augmented-reality system 1300 to neckband 135 may reduce weight and heat in eyewear device 1302, making it more comfortable to the user.
Power source 1335 in neckband 135 may provide power to eyewear device 1302 and/or to neckband 135. Power source 1335 may include, without limitation, lithium ion batteries, lithium-polymer batteries, primary lithium batteries, alkaline batteries, or any other form of power storage. In some cases, power source 1335 may be a wired power source. Including power source 1335 on neckband 135 instead of on eyewear device 1302 may help better distribute the weight and heat generated by power source 1335.
As noted, some artificial-reality systems may, instead of blending an artificial reality with actual reality, substantially replace one or more of a user's sensory perceptions of the real world with a virtual experience. One example of this type of system is a head-worn display system, such as virtual-reality system 1400 in
Artificial-reality systems may include a variety of types of visual feedback mechanisms. For example, display devices in augmented-reality system 1300 and/or virtual-reality system 1400 may include one or more liquid crystal displays (LCDs), light emitting diode (LED) displays, microLED displays, organic LED (OLED) displays, digital light project (DLP) micro-displays, liquid crystal on silicon (LCoS) micro-displays, and/or any other suitable type of display screen. These artificial-reality systems may include a single display screen for both eyes or may provide a display screen for each eye, which may allow for additional flexibility for varifocal adjustments or for correcting a user's refractive error. Some of these artificial-reality systems may also include optical subsystems having one or more lenses (e.g., concave or convex lenses, Fresnel lenses, adjustable liquid lenses, etc.) through which a user may view a display screen. These optical subsystems may serve a variety of purposes, including to collimate (e.g., make an object appear at a greater distance than its physical distance), to magnify (e.g., make an object appear larger than its actual size), and/or to relay (to, e.g., the viewer's eyes) light. These optical subsystems may be used in a non-pupil-forming architecture (such as a single lens configuration that directly collimates light but results in so-called pincushion distortion) and/or a pupil-forming architecture (such as a multi-lens configuration that produces so-called barrel distortion to nullify pincushion distortion).
In addition to or instead of using display screens, some of the artificial-reality systems described herein may include one or more projection systems. For example, display devices in augmented-reality system 1300 and/or virtual-reality system 1400 may include micro-LED projectors that project light (using, e.g., a waveguide) into display devices, such as clear combiner lenses that allow ambient light to pass through. The display devices may refract the projected light toward a user's pupil and may enable a user to simultaneously view both artificial-reality content and the real world. The display devices may accomplish this using any of a variety of different optical components, including waveguide components (e.g., holographic, planar, diffractive, polarized, and/or reflective waveguide elements), light-manipulation surfaces and elements (such as diffractive, reflective, and refractive elements and gratings), coupling elements, etc. Artificial-reality systems may also be configured with any other suitable type or form of image projection system, such as retinal projectors used in virtual retina displays.
The artificial-reality systems described herein may also include various types of computer vision components and subsystems. For example, augmented-reality system 1300 and/or virtual-reality system 1400 may include one or more optical sensors, such as two-dimensional (2D) or 3D cameras, structured light transmitters and detectors, time-of-flight depth sensors, single-beam or sweeping laser rangefinders, 3D LiDAR sensors, and/or any other suitable type or form of optical sensor. An artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions.
The artificial-reality systems described herein may also include one or more input and/or output audio transducers. Output audio transducers may include voice coil speakers, ribbon speakers, electrostatic speakers, piezoelectric speakers, bone conduction transducers, cartilage conduction transducers, tragus-vibration transducers, and/or any other suitable type or form of audio transducer. Similarly, input audio transducers may include condenser microphones, dynamic microphones, ribbon microphones, and/or any other type or form of input transducer. In some embodiments, a single transducer may be used for both audio input and audio output.
In some embodiments, the artificial-reality systems described herein may also include tactile (i.e., haptic) feedback systems, which may be incorporated into headwear, gloves, body suits, handheld controllers, environmental devices (e.g., chairs, floormats, etc.), and/or any other type of device or system. Haptic feedback systems may provide various types of cutaneous feedback, including vibration, force, traction, texture, and/or temperature. Haptic feedback systems may also provide various types of kinesthetic feedback, such as motion and compliance. Haptic feedback may be implemented using motors, piezoelectric actuators, fluidic systems, and/or a variety of other types of feedback mechanisms. Haptic feedback systems may be implemented independent of other artificial-reality devices, within other artificial-reality devices, and/or in conjunction with other artificial-reality devices.
By providing haptic sensations, audible content, and/or visual content, artificial-reality systems may create an entire virtual experience or enhance a user's real-world experience in a variety of contexts and environments. For instance, artificial-reality systems may assist or extend a user's perception, memory, or cognition within a particular environment. Some systems may enhance a user's interactions with other people in the real world or may enable more immersive interactions with other people in a virtual world. Artificial-reality systems may also be used for educational purposes (e.g., for teaching or training in schools, hospitals, government organizations, military organizations, business enterprises, etc.), entertainment purposes (e.g., for playing video games, listening to music, watching video content, etc.), and/or for accessibility purposes (e.g., as hearing aids, visual aids, etc.). The embodiments disclosed herein may enable or enhance a user's artificial-reality experience in one or more of these contexts and environments and/or in other contexts and environments.
Some augmented-reality systems may map a user's and/or device's environment using techniques referred to as “simultaneous location and mapping” (SLAM). SLAM mapping and location identifying techniques may involve a variety of hardware and software tools that can create or update a map of an environment while simultaneously keeping track of a user's location within the mapped environment. SLAM may use many different types of sensors to create a map and determine a user's position within the map.
SLAM techniques may, for example, implement optical sensors to determine a user's location. Radios including WiFi, BLUETOOTH, global positioning system (GPS), cellular or other communication devices may be also used to determine a user's location relative to a radio transceiver or group of transceivers (e.g., a WiFi router or group of GPS satellites). Acoustic sensors such as microphone arrays or 2D or 3D sonar sensors may also be used to determine a user's location within an environment. Augmented-reality and virtual-reality devices (such as systems 1300 and 1400 of
As noted, artificial-reality systems 1300 and 1400 may be used with a variety of other types of devices to provide a more compelling artificial-reality experience. These devices may be haptic interfaces with transducers that provide haptic feedback and/or that collect haptic information about a user's interaction with an environment. The artificial-reality systems disclosed herein may include various types of haptic interfaces that detect or convey various types of haptic information, including tactile feedback (e.g., feedback that a user detects via nerves in the skin, which may also be referred to as cutaneous feedback) and/or kinesthetic feedback (e.g., feedback that a user detects via receptors located in muscles, joints, and/or tendons).
Haptic feedback may be provided by interfaces positioned within a user's environment (e.g., chairs, tables, floors, etc.) and/or interfaces on articles that may be worn or carried by a user (e.g., gloves, wristbands, etc.). As an example,
One or more vibrotactile devices 1540 may be positioned at least partially within one or more corresponding pockets formed in textile material 1530 of vibrotactile system 1500. Vibrotactile devices 1540 may be positioned in locations to provide a vibrating sensation (e.g., haptic feedback) to a user of vibrotactile system 1500. For example, vibrotactile devices 1540 may be positioned against the user's finger(s), thumb, or wrist, as shown in
A power source 1550 (e.g., a battery) for applying a voltage to the vibrotactile devices 1540 for activation thereof may be electrically coupled to vibrotactile devices 1540, such as via conductive wiring 1552. In some examples, each of vibrotactile devices 1540 may be independently electrically coupled to power source 1550 for individual activation. In some embodiments, a processor 1560 may be operatively coupled to power source 1550 and configured (e.g., programmed) to control activation of vibrotactile devices 1540.
Vibrotactile system 1500 may be implemented in a variety of ways. In some examples, vibrotactile system 1500 may be a standalone system with integral subsystems and components for operation independent of other devices and systems. As another example, vibrotactile system 1500 may be configured for interaction with another device or system 1570. For example, vibrotactile system 1500 may, in some examples, include a communications interface 1580 for receiving and/or sending signals to the other device or system 1570. The other device or system 1570 may be a mobile device, a gaming console, an artificial-reality (e.g., virtual-reality, augmented-reality, mixed-reality) device, a personal computer, a tablet computer, a network device (e.g., a modem, a router, etc.), a handheld controller, etc. Communications interface 1580 may enable communications between vibrotactile system 1500 and the other device or system 1570 via a wireless (e.g., Wi-Fi, BLUETOOTH, cellular, radio, etc.) link or a wired link. If present, communications interface 1580 may be in communication with processor 1560, such as to provide a signal to processor 1560 to activate or deactivate one or more of the vibrotactile devices 1540.
Vibrotactile system 1500 may optionally include other subsystems and components, such as touch-sensitive pads 1590, pressure sensors, motion sensors, position sensors, lighting elements, and/or user interface elements (e.g., an on/off button, a vibration control element, etc.). During use, vibrotactile devices 1540 may be configured to be activated for a variety of different reasons, such as in response to the user's interaction with user interface elements, a signal from the motion or position sensors, a signal from the touch-sensitive pads 1590, a signal from the pressure sensors, a signal from the other device or system 1570, etc.
Although power source 1550, processor 1560, and communications interface 1580 are illustrated in
Haptic wearables, such as those shown in and described in connection with
Head-mounted display 1602 generally represents any type or form of virtual-reality system, such as virtual-reality system 1400 in
While haptic interfaces may be used with virtual-reality systems, as shown in
One or more of band elements 1732 may include any type or form of actuator suitable for providing haptic feedback. For example, one or more of band elements 1732 may be configured to provide one or more of various types of cutaneous feedback, including vibration, force, traction, texture, and/or temperature. To provide such feedback, band elements 1732 may include one or more of various types of actuators. In one example, each of band elements 1732 may include a vibrotactor (e.g., a vibrotactile actuator) configured to vibrate in unison or independently to provide one or more of various types of haptic sensations to a user. Alternatively, only a single band element or a subset of band elements may include vibrotactors.
Haptic devices 1510, 1520, 164, and 1730 may include any suitable number and/or type of haptic transducer, sensor, and/or feedback mechanism. For example, haptic devices 1510, 1520, 164, and 1730 may include one or more mechanical transducers, piezoelectric transducers, and/or fluidic transducers. Haptic devices 1510, 1520, 164, and 1730 may also include various combinations of different types and forms of transducers that work together or independently to enhance a user's artificial-reality experience. In one example, each of band elements 1732 of haptic device 1730 may include a vibrotactor (e.g., a vibrotactile actuator) configured to vibrate in unison or independently to provide one or more of various types of haptic sensations to a user.
In some embodiments, the systems described herein may also include an eye-tracking subsystem designed to identify and track various characteristics of a user's eye(s), such as the user's gaze direction. The phrase “eye tracking” may, in some examples, refer to a process by which the position, orientation, and/or motion of an eye is measured, detected, sensed, determined, and/or monitored. The disclosed systems may measure the position, orientation, and/or motion of an eye in a variety of different ways, including through the use of various optical-based eye-tracking techniques, ultrasound-based eye-tracking techniques, etc. An eye-tracking subsystem may be configured in a number of different ways and may include a variety of different eye-tracking hardware components or other computer-vision components. For example, an eye-tracking subsystem may include a variety of different optical sensors, such as two-dimensional (2D) or 3D cameras, time-of-flight depth sensors, single-beam or sweeping laser rangefinders, 3D LiDAR sensors, and/or any other suitable type or form of optical sensor. In this example, a processing subsystem may process data from one or more of these sensors to measure, detect, determine, and/or otherwise monitor the position, orientation, and/or motion of the user's eye(s).
In some embodiments, optical subsystem 184 may receive the light generated by light source 1802 and generate, based on the received light, converging light 1820 that includes the image. In some examples, optical subsystem 184 may include any number of lenses (e.g., Fresnel lenses, convex lenses, concave lenses), apertures, filters, mirrors, prisms, and/or other optical components, possibly in combination with actuators and/or other devices. In particular, the actuators and/or other devices may translate and/or rotate one or more of the optical components to alter one or more aspects of converging light 1820. Further, various mechanical couplings may serve to maintain the relative spacing and/or the orientation of the optical components in any suitable combination.
In one embodiment, eye-tracking subsystem 186 may generate tracking information indicating a gaze angle of an eye 1801 of the viewer. In this embodiment, control subsystem 188 may control aspects of optical subsystem 184 (e.g., the angle of incidence of converging light 1820) based at least in part on this tracking information. Additionally, in some examples, control subsystem 188 may store and utilize historical tracking information (e.g., a history of the tracking information over a given duration, such as the previous second or fraction thereof) to anticipate the gaze angle of eye 1801 (e.g., an angle between the visual axis and the anatomical axis of eye 1801). In some embodiments, eye-tracking subsystem 186 may detect radiation emanating from some portion of eye 1801 (e.g., the cornea, the iris, the pupil, or the like) to determine the current gaze angle of eye 1801. In other examples, eye-tracking subsystem 186 may employ a wavefront sensor to track the current location of the pupil.
Any number of techniques can be used to track eye 1801. Some techniques may involve illuminating eye 1801 with infrared light and measuring reflections with at least one optical sensor that is tuned to be sensitive to the infrared light. Information about how the infrared light is reflected from eye 1801 may be analyzed to determine the position(s), orientation(s), and/or motion(s) of one or more eye feature(s), such as the cornea, pupil, iris, and/or retinal blood vessels.
In some examples, the radiation captured by a sensor of eye-tracking subsystem 186 may be digitized (i.e., converted to an electronic signal). Further, the sensor may transmit a digital representation of this electronic signal to one or more processors (for example, processors associated with a device including eye-tracking subsystem 186). Eye-tracking subsystem 186 may include any of a variety of sensors in a variety of different configurations. For example, eye-tracking subsystem 186 may include an infrared detector that reacts to infrared radiation. The infrared detector may be a thermal detector, a photonic detector, and/or any other suitable type of detector. Thermal detectors may include detectors that react to thermal effects of the incident infrared radiation.
In some examples, one or more processors may process the digital representation generated by the sensor(s) of eye-tracking subsystem 186 to track the movement of eye 1801. In another example, these processors may track the movements of eye 1801 by executing algorithms represented by computer-executable instructions stored on non-transitory memory. In some examples, on-chip logic (e.g., an application-specific integrated circuit or ASIC) may be used to perform at least portions of such algorithms. As noted, eye-tracking subsystem 186 may be programmed to use an output of the sensor(s) to track movement of eye 1801. In some embodiments, eye-tracking subsystem 186 may analyze the digital representation generated by the sensors to extract eye rotation information from changes in reflections. In one embodiment, eye-tracking subsystem 186 may use corneal reflections or glints (also known as Purkinje images) and/or the center of the eye's pupil 1822 as features to track over time.
In some embodiments, eye-tracking subsystem 186 may use the center of the eye's pupil 1822 and infrared or near-infrared, non-collimated light to create corneal reflections. In these embodiments, eye-tracking subsystem 186 may use the vector between the center of the eye's pupil 1822 and the corneal reflections to compute the gaze direction of eye 1801. In some embodiments, the disclosed systems may perform a calibration procedure for an individual (using, e.g., supervised or unsupervised techniques) before tracking the user's eyes. For example, the calibration procedure may include directing users to look at one or more points displayed on a display while the eye-tracking system records the values that correspond to each gaze position associated with each point.
In some embodiments, eye-tracking subsystem 186 may use two types of infrared and/or near-infrared (also known as active light) eye-tracking techniques: bright-pupil and dark-pupil eye tracking, which may be differentiated based on the location of an illumination source with respect to the optical elements used. If the illumination is coaxial with the optical path, then eye 1801 may act as a retroreflector as the light reflects off the retina, thereby creating a bright pupil effect similar to a red-eye effect in photography. If the illumination source is offset from the optical path, then the eye's pupil 1822 may appear dark because the retroreflection from the retina is directed away from the sensor. In some embodiments, bright-pupil tracking may create greater iris/pupil contrast, allowing more robust eye tracking with iris pigmentation, and may feature reduced interference (e.g., interference caused by eyelashes and other obscuring features). Bright-pupil tracking may also allow tracking in lighting conditions ranging from total darkness to a very bright environment.
In some embodiments, control subsystem 188 may control light source 1802 and/or optical subsystem 184 to reduce optical aberrations (e.g., chromatic aberrations and/or monochromatic aberrations) of the image that may be caused by or influenced by eye 1801. In some examples, as mentioned above, control subsystem 188 may use the tracking information from eye-tracking subsystem 186 to perform such control. For example, in controlling light source 1802, control subsystem 188 may alter the light generated by light source 1802 (e.g., by way of image rendering) to modify (e.g., pre-distort) the image so that the aberration of the image caused by eye 1801 is reduced.
The disclosed systems may track both the position and relative size of the pupil (since, e.g., the pupil dilates and/or contracts). In some examples, the eye-tracking devices and components (e.g., sensors and/or sources) used for detecting and/or tracking the pupil may be different (or calibrated differently) for different types of eyes. For example, the frequency range of the sensors may be different (or separately calibrated) for eyes of different colors and/or different pupil types, sizes, and/or the like. As such, the various eye-tracking components (e.g., infrared sources and/or sensors) described herein may need to be calibrated for each individual user and/or eye.
The disclosed systems may track both eyes with and without ophthalmic correction, such as that provided by contact lenses worn by the user. In some embodiments, ophthalmic correction elements (e.g., adjustable lenses) may be directly incorporated into the artificial reality systems described herein. In some examples, the color of the user's eye may necessitate modification of a corresponding eye-tracking algorithm. For example, eye-tracking algorithms may need to be modified based at least in part on the differing color contrast between a brown eye and, for example, a blue eye.
Sensor 196 generally represents any type or form of element capable of detecting radiation, such as radiation reflected off the user's eye 1902. Examples of sensor 196 include, without limitation, a charge coupled device (CCD), a photodiode array, a complementary metal-oxide-semiconductor (CMOS) based sensor device, and/or the like. In one example, sensor 196 may represent a sensor having predetermined parameters, including, but not limited to, a dynamic resolution range, linearity, and/or other characteristic selected and/or designed specifically for eye tracking.
As detailed above, eye-tracking subsystem 1900 may generate one or more glints. As detailed above, a glint 193 may represent reflections of radiation (e.g., infrared radiation from an infrared source, such as source 194) from the structure of the user's eye. In various embodiments, glint 193 and/or the user's pupil may be tracked using an eye-tracking algorithm executed by a processor (either within or external to an artificial reality device). For example, an artificial reality device may include a processor and/or a memory device in order to perform eye tracking locally and/or a transceiver to send and receive the data necessary to perform eye tracking on an external device (e.g., a mobile phone, cloud server, or other computing device).
In one example, eye-tracking subsystem 1900 may be configured to identify and measure the inter-pupillary distance (IPD) of a user. In some embodiments, eye-tracking subsystem 1900 may measure and/or calculate the IPD of the user while the user is wearing the artificial reality system. In these embodiments, eye-tracking subsystem 1900 may detect the positions of a user's eyes and may use this information to calculate the user's IPD.
As noted, the eye-tracking systems or subsystems disclosed herein may track a user's eye position and/or eye movement in a variety of ways. In one example, one or more light sources and/or optical sensors may capture an image of the user's eyes. The eye-tracking subsystem may then use the captured information to determine the user's inter-pupillary distance, interocular distance, and/or a 3D position of each eye (e.g., for distortion adjustment purposes), including a magnitude of torsion and rotation (i.e., roll, pitch, and yaw) and/or gaze directions for each eye. In one example, infrared light may be emitted by the eye-tracking subsystem and reflected from each eye. The reflected light may be received or detected by an optical sensor and analyzed to extract eye rotation data from changes in the infrared light reflected by each eye.
The eye-tracking subsystem may use any of a variety of different methods to track the eyes of a user. For example, a light source (e.g., infrared light-emitting diodes) may emit a dot pattern onto each eye of the user. The eye-tracking subsystem may then detect (e.g., via an optical sensor coupled to the artificial reality system) and analyze a reflection of the dot pattern from each eye of the user to identify a location of each pupil of the user. Accordingly, the eye-tracking subsystem may track up to six degrees of freedom of each eye (i.e., 3D position, roll, pitch, and yaw) and at least a subset of the tracked quantities may be combined from two eyes of a user to estimate a gaze point (i.e., a 3D location or position in a virtual scene where the user is looking) and/or an IPD.
In some cases, the distance between a user's pupil and a display may change as the user's eye moves to look in different directions. The varying distance between a pupil and a display as viewing direction changes may be referred to as “pupil swim” and may contribute to distortion perceived by the user as a result of light focusing in different locations as the distance between the pupil and the display changes. Accordingly, measuring distortion at different eye positions and pupil distances relative to displays and generating distortion corrections for different positions and distances may allow mitigation of distortion caused by pupil swim by tracking the 3D position of a user's eyes and applying a distortion correction corresponding to the 3D position of each of the user's eyes at a given point in time. Thus, knowing the 3D position of each of a user's eyes may allow for the mitigation of distortion caused by changes in the distance between the pupil of the eye and the display by applying a distortion correction for each 3D eye position. Furthermore, as noted above, knowing the position of each of the user's eyes may also enable the eye-tracking subsystem to make automated adjustments for a user's IPD.
In some embodiments, a display subsystem may include a variety of additional subsystems that may work in conjunction with the eye-tracking subsystems described herein. For example, a display subsystem may include a varifocal subsystem, a scene-rendering module, and/or a vergence-processing module. The varifocal subsystem may cause left and right display elements to vary the focal distance of the display device. In one embodiment, the varifocal subsystem may physically change the distance between a display and the optics through which it is viewed by moving the display, the optics, or both. Additionally, moving or translating two lenses relative to each other may also be used to change the focal distance of the display. Thus, the varifocal subsystem may include actuators or motors that move displays and/or optics to change the distance between them. This varifocal subsystem may be separate from or integrated into the display subsystem. The varifocal subsystem may also be integrated into or separate from its actuation subsystem and/or the eye-tracking subsystems described herein.
In one example, the display subsystem may include a vergence-processing module configured to determine a vergence depth of a user's gaze based on a gaze point and/or an estimated intersection of the gaze lines determined by the eye-tracking subsystem. Vergence may refer to the simultaneous movement or rotation of both eyes in opposite directions to maintain single binocular vision, which may be naturally and automatically performed by the human eye. Thus, a location where a user's eyes are verged is where the user is looking and is also typically the location where the user's eyes are focused. For example, the vergence-processing module may triangulate gaze lines to estimate a distance or depth from the user associated with intersection of the gaze lines. The depth associated with intersection of the gaze lines may then be used as an approximation for the accommodation distance, which may identify a distance from the user where the user's eyes are directed. Thus, the vergence distance may allow for the determination of a location where the user's eyes should be focused and a depth from the user's eyes at which the eyes are focused, thereby providing information (such as an object or plane of focus) for rendering adjustments to the virtual scene.
The vergence-processing module may coordinate with the eye-tracking subsystems described herein to make adjustments to the display subsystem to account for a user's vergence depth. When the user is focused on something at a distance, the user's pupils may be slightly farther apart than when the user is focused on something close. The eye-tracking subsystem may obtain information about the user's vergence or focus depth and may adjust the display subsystem to be closer together when the user's eyes focus or verge on something close and to be farther apart when the user's eyes focus or verge on something at a distance.
The eye-tracking information generated by the above-described eye-tracking subsystems may also be used, for example, to modify various aspect of how different computer-generated images are presented. For example, a display subsystem may be configured to modify, based on information generated by an eye-tracking subsystem, at least one aspect of how the computer-generated images are presented. For instance, the computer-generated images may be modified based on the user's eye movement, such that if a user is looking up, the computer-generated images may be moved upward on the screen. Similarly, if the user is looking to the side or down, the computer-generated images may be moved to the side or downward on the screen. If the user's eyes are closed, the computer-generated images may be paused or removed from the display and resumed once the user's eyes are back open.
The above-described eye-tracking subsystems can be incorporated into one or more of the various artificial reality systems described herein in a variety of ways. For example, one or more of the various components of system 1800 and/or eye-tracking subsystem 1900 may be incorporated into augmented-reality system 1300 in
Dongle portion 2120 may include antenna 2152, which may be configured to communicate with antenna 2150 included as part of wearable portion 2110. Communication between antennas 2150 and 2152 may occur using any suitable wireless technology and protocol, non-limiting examples of which include radiofrequency signaling and BLUETOOTH. As shown, the signals received by antenna 2152 of dongle portion 2120 may be provided to a host computer for further processing, display, and/or for effecting control of a particular physical or virtual object or objects.
Although the examples provided with reference to
Biosignals (e.g., biopotential signals) measured or recorded by electrodes 2210 may be small, and amplification of the biosignals recorded by electrodes 2210 may be desired. As shown in
As shown in
As detailed above, the computing devices and systems described and/or illustrated herein broadly represent any type or form of computing device or system capable of executing computer-readable instructions, such as those contained within the modules described herein. In their most basic configuration, these computing device(s) may each include at least one memory device and at least one physical processor.
In some examples, the term “memory device” generally refers to any type or form of volatile or non-volatile storage device or medium capable of storing data and/or computer-readable instructions. In one example, a memory device may store, load, and/or maintain one or more of the modules described herein. Examples of memory devices include, without limitation, Random Access Memory (RAM), Read Only Memory (ROM), flash memory, Hard Disk Drives (HDDs), Solid-State Drives (SSDs), optical disk drives, caches, variations or combinations of one or more of the same, or any other suitable storage memory.
In some examples, the term “physical processor” generally refers to any type or form of hardware-implemented processing unit capable of interpreting and/or executing computer-readable instructions. In one example, a physical processor may access and/or modify one or more modules stored in the above-described memory device. Examples of physical processors include, without limitation, microprocessors, microcontrollers, Central Processing Units (CPUs), Field-Programmable Gate Arrays (FPGAs) that implement softcore processors, Application-Specific Integrated Circuits (ASICs), portions of one or more of the same, variations or combinations of one or more of the same, or any other suitable physical processor.
Although illustrated as separate elements, the modules described and/or illustrated herein may represent portions of a single module or application. In addition, in certain embodiments one or more of these modules may represent one or more software applications or programs that, when executed by a computing device, may cause the computing device to perform one or more tasks. For example, one or more of the modules described and/or illustrated herein may represent modules stored and configured to run on one or more of the computing devices or systems described and/or illustrated herein. One or more of these modules may also represent all or portions of one or more special-purpose computers configured to perform one or more tasks.
In addition, one or more of the modules described herein may transform data, physical devices, and/or representations of physical devices from one form to another. For example, one or more of the modules recited herein may receive biosignals (e.g., biosignals containing eye-tracking data) to be transformed, transform the biosignals into a prediction of a user's intention to interact, output a result of the transformation to an intelligent-facilitation subsystem, and/or use the result of the transformation to suggest potential targets to the user and/or enable the user to select or interact with these suggested targets through a low-friction interaction. Additionally or alternatively, one or more of the modules recited herein may transform a processor, volatile memory, non-volatile memory, and/or any other portion of a physical computing device from one form to another by executing on the computing device, storing data on the computing device, and/or otherwise interacting with the computing device.
In some embodiments, the term “computer-readable medium” generally refers to any form of device, carrier, or medium capable of storing or carrying computer-readable instructions. Examples of computer-readable media include, without limitation, transmission-type media, such as carrier waves, and non-transitory-type media, such as magnetic-storage media (e.g., hard disk drives, tape drives, and floppy disks), optical-storage media (e.g., Compact Disks (CDs), Digital Video Disks (DVDs), and BLU-RAY disks), electronic-storage media (e.g., solid-state drives and flash media), and other distribution systems.
The process parameters and sequence of the steps described and/or illustrated herein are given by way of example only and can be varied as desired. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed. The various exemplary methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or include additional steps in addition to those disclosed.
The preceding description has been provided to enable others skilled in the art to best utilize various aspects of the exemplary embodiments disclosed herein. This exemplary description is not intended to be exhaustive or to be limited to any precise form disclosed. Many modifications and variations are possible without departing from the spirit and scope of the present disclosure. The embodiments disclosed herein should be considered in all respects illustrative and not restrictive. Reference should be made to the appended claims and their equivalents in determining the scope of the present disclosure.
Unless otherwise noted, the terms “connected to” and “coupled to” (and their derivatives), as used in the specification and claims, are to be construed as permitting both direct and indirect (i.e., via other elements or components) connection. In addition, the terms “a” or “an,” as used in the specification and claims, are to be construed as meaning “at least one of.” Finally, for ease of use, the terms “including” and “having” (and their derivatives), as used in the specification and claims, are interchangeable with and have the same meaning as the word “comprising.”
This application claims the benefit of U.S. Provisional Application No. 63/142,415, filed 27 Jan. 2021, the disclosures of each of which are incorporated, in their entirety, by this reference.
Number | Date | Country | |
---|---|---|---|
63142415 | Jan 2021 | US |