Real-time eye tracking may be used to estimate and map a user's gaze direction to coordinates on a display device. For example, a location on a display at which a user's gaze direction intersects the display may be used as a mechanism for interacting with user interface objects displayed on the display. Various methods of eye tracking may be used. For example, in some approaches, light, e.g., in the infrared range or any other suitable frequency, from one or more light sources may be directed toward a user's eye, and a camera may be used to capture image data of the user's eye. Locations of reflections of the light on the user's eye and a position of the pupil of the eye may be detected in the image data to determine a direction of the user's gaze. Gaze direction information may be used in combination with information regarding a distance from the user's eye to a display to determine the location on the display at which the user's eye gaze direction intersects the display.
Embodiments related to eye tracking utilizing time-of-flight depth image data of the user's eye are disclosed. For example, one disclosed embodiment provides an eye tracking system comprising a light source, a sensing subsystem configured to obtain a two-dimensional image of a user's eye and depth data of the user's eye, and a logic subsystem to control the light source to emit light, control the sensing subsystem to acquire a two-dimensional image of the user's eye while emitting light from the light source, control the sensing subsystem to acquire depth data of the user's eye, determine a gaze direction of the user's eye from the two-dimensional image, determine a location on a display at which the user's gaze intersects the display based on the gaze direction and the depth of the user's eye obtained from the depth data, and output the location.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
As described above, eye tracking may be used to map a user's gaze to a user interface displayed on a display device based upon an estimated location at which the gaze intersects the display device. The location at which a user's gaze direction intersects the display device thus may act as a user input mechanism for the user interface.
Eye tracking may be performed in a variety of ways. For example, as described above, glints light from calibrated light sources reflected from a user's eyes, together with detected or estimated pupil locations of the user's eyes, may be used to determine a direction of the user's gaze. A distance from the user's eyes to a display device may then be estimated or detected to determine the location on the display at which the user's gaze direction intersects the display. As one example, stereo cameras having a fixed or otherwise known relationship to the display may be used to determine the distance from the user's eyes to the display. However, as described below, stereo cameras may impose geometric constraints that make their use difficult in some environments.
Eye tracking may be used in a variety of different hardware environments. For example,
In these and/or other hardware settings, the accuracy and stability of the eye tracking system may be dependent upon obtaining an accurate estimate of the distance of the eye from the camera plane. Current eye tracking systems may solve this problem through the use of a stereo camera pair to estimate the three-dimensional eye position using computer vision algorithms.
However, the baseline distance 412 between the first camera 406 and second camera 408 may be geometrically constrained to being greater than a threshold distance (e.g., greater than 10 cm) for accurate determination (triangulation) of the distance between the user's eye 114 and the display 402. This may limit the ability to reduce the size of such an eye tracking unit, and may be difficult to use with some hardware configurations, such as a head-mounted display or other compact display device.
Other approaches to determining a distance between a user's eye and a display may rely on a single camera system and utilize a weak estimation of the eye distance. However, such approaches may result in an unstable mapping between actual gaze location and screen coordinates.
Accordingly, embodiments are disclosed herein that relate to the use of a depth sensor having an unconstrained baseline distance (i.e. no minimum baseline distance, as opposed to a stereo camera arrangement) in an eye tracking system to obtain information about location and position of a user's eyes. One example of such a depth sensor is a time-of-flight depth camera. A time-of-flight depth camera utilizes a light source configured to emit pulses of light, and one or more image sensors configured to be shuttered to capture a series of temporally sequential image frames timed relative to a corresponding light pulse. Depth at each pixel of an image sensor in the depth camera, i.e., the effective distance that light from the light source that is reflected by an object travels from the object to that pixel of the image sensor, may be determined based upon a light intensity in each sequential image, due to light reflected from objects at different depths being captured in different sequential image frames.
As a time-of-flight depth camera may acquire image data from a single location, rather than from two locations as with a stereo pair of image sensors, an eye tracking system utilizing a time-of-flight depth camera may not have minimum baseline dimensional constraints as found with stereo camera configurations. This may allow the eye tracking system to be more easily utilized in hardware configurations such as head-mounted displays, smart phones, tablet computers, and other small devices where sufficient space for a stereo camera eye tracking system may not be available. Other examples of depth sensors with unconstrained baseline distances may include, but are not limited to, LIDAR (Light Detection and Ranging) and sound propagation-based methods.
Eye tracking module 500 includes a sensing subsystem 506 configured to obtain a two-dimensional image of a user's eye and also depth data of the user's eye. For example, the sensing subsystem 506 may include a time-of-flight depth camera 504, where the time-of-flight depth camera 504 includes a light source 510 and one or more image sensors 512. As described above, the light source 510 may be configured to emit pulses of light, and the one or more image sensors may be configured to be shuttered to capture a series of temporally sequential image frames timed relative to a corresponding light pulse. Depth at each pixel, i.e., the effective distance that light from the light source that is reflected by an object travels from the object to that pixel of the image sensor, may be determined based upon a light intensity in each sequential image, due to light reflected from objects at different depths being captured in different sequential image frames. It will be appreciated that any other depth sensor having an unconstrained baseline distance may be used in other embodiments instead of, or in addition, to the time-of-flight depth camera 504.
In some examples, the image sensor(s) 512 included in depth camera 504 also may be used to acquire two-dimensional image data (i.e. intensity data as a function of horizontal and vertical position in a field of view of the image sensor, instead of depth) to determine a location of a reflection and a pupil of a user's eye, in addition to depth data. For example, all of the sequential images for a depth measurement may be summed to determine a total light intensity at each pixel. In other embodiments, one or more separate image sensors may be utilized to detect images of the user's pupil and reflections of light source light from the user's eye, as shown by two-dimensional camera(s) 514.
In some embodiments, a single two-dimensional camera 514 may be used along with a time-of-flight depth camera. In other embodiments, the sensing subsystem 506 may utilize more than one two-dimensional camera, in addition to a time-of-flight depth camera. For example, the sensing subsystem 506 may utilize a first two-dimensional camera to obtain a relatively wider field of view image to help locate a position of the eyes of a user. This may help to find and track eye sockets of the user, so that regions of the user containing the user's eyes may be identified. Further, a second two-dimensional camera may be used to capture a higher resolution image of a narrower field of view directed at the identified regions of the user's eye to acquire eye-tracking data. By roughly identifying eye location in this manner, the spatial region that is analyzed for pupil and corneal pattern detection may be reduced in the higher resolution image, as non-eye regions as determined from the lower resolution image data may be ignored when analyzing the higher resolution image data.
In some embodiments, the depth camera may operate in the infrared range and the additional camera 514 may operate in the visible range. For example, an eye-tracking module may consist of a depth camera and a visible range high-resolution camera (e.g., a front facing camera on a slate).
In some embodiments, the eye tracking module 500 also may include a light source 518 to provide light for generating corneal reflections that is different from the light source 510 of depth camera 504. Any suitable light source may be used as a light source 518. For example, light source 518 may comprise one or more infrared light-emitting diodes (LED) positioned at any suitable position relative to an optical axis of a user gazing forward. Any suitable combination of light sources may be used, and the light sources may be illuminated in any suitable temporal pattern. In other embodiments, the light source 510 of the time-of-flight depth camera 504 may be configured to be used as a light source for reflecting light from a user's eye. It will be understood that these embodiments are described for the purpose of example, and are not intended to be limiting in any manner.
Eye tracking module 500 further includes a logic subsystem 520 and a storage subsystem 522 comprising instructions stored thereon that are executable by the logic subsystem to perform various tasks, including but not limited to tasks related to eye tracking and to user interface interactions utilizing eye tracking. More detail regarding computing system hardware is described below.
In order to determine a rotation of the user's eye 114, each reflection provides a reference with which the pupil can be compared to determine a direction of eye rotation. As such, the two-dimensional camera 514 may acquire two-dimensional image data of the reflection as reflected 606 from the user's eye. The location of the pupil 116 of the user's eye 114 and the light reflection location may be determined from the two-dimensional image data. The gaze direction 118 may then be determined from the location of the pupil and the location of the reflection.
Further, the depth camera 504 may acquire a time-of-flight depth image via light reflected 608 from the eye that arises from a light pulse 609 emitted by the depth camera light source. The depth image then may be used to detect a distance of the user's eye from the display. The angle or positioning of the depth camera 504 with respect to the display 120 may be fixed, or otherwise known (e.g. via a calibration process). Thus, the two-dimensional image data and depth data may be used to determine and output a location at which the gaze direction intersects the display.
At 704, method 700 includes illuminating a light source to emit light from the light source. Any suitable light source may be used. For example, the light source may comprise one or more infrared light-emitting diodes (LED) positioned on or off axis. Any suitable combination of on-axis and off-axis light sources may be used, and the light sources may be illuminated in any suitable temporal pattern. Further, in some examples, the light source may comprise a light source incorporated in a time-of-flight depth camera. It will be understood that these embodiments are described for the purpose of example, and are not intended to be limiting in any manner.
Method 700 further includes, at 706, acquiring an image of the eye while emitting light from the light source. For example, a two-dimensional image of the eye may be obtained via a dedicated two-dimensional camera, or time-of-flight depth data may be summed across all sequentially shuttered images for a depth measurement. Further, at 708, method 700 includes acquiring a time-of-flight image of the eye, for example, via a time-of-flight depth camera, or otherwise acquiring depth data of the eye via a suitable depth sensor having an unconstrained baseline distance.
At 710, method 700 includes detecting a location of a pupil of the eye from the two dimensional data. Any suitable optical and/or image processing methods may be used to detect the location of the pupil of the eye. For example, in some embodiments, a bright pupil effect may be produced to help detect the position of the pupil of the eye. In other embodiments, the pupil may be located without the use of a bright pupil effect. At 712, method 700 further includes detecting a location of one or more reflections from the eye from the two-dimension image data. It will be understood that various techniques may be used to distinguish reflections arising from eye tracking light sources from reflections arising from environmental sources. For example, an ambient-only image may be acquired with all light sources turned off, and the ambient-only image may be subtracted from an image with the light sources on to remove environmental reflections from the image.
Method 700 further includes, at 714, determining a gaze direction of the eye from the location of the pupil and the location of reflections on the user's eye arising from the light sources. The reflection or reflections provide one or more references to which the pupil can be compared for determining a direction in which the eye is gazing.
At 716, method 700 includes determining a distance from the eye to a display. For example, the time-of-flight image data of the eye may be used to determine a distance from the eye to an image sensor in the depth camera. The distance from the eye to the image sensor may then be used to determine a distance along the gaze direction of the eye to the display. From this information, at 718, method 700 includes determining and outputting a location on a display at which the gaze direction intersects the display.
Thus, the disclosed embodiments may allow for a stable and accurate eye tracking system without the use of a stereo camera, and thus without the use of a large minimum baseline constraint that may be found with stereo camera systems. This may allow for the production of compact modular eye tracking systems that can be incorporated into any suitable device.
Computing system 800 includes a logic subsystem 802 and a storage subsystem 804. Computing system 800 may optionally include an output subsystem 806, input subsystem 808, communication subsystem 810, and/or other components not shown in
Logic subsystem 802 includes one or more physical devices configured to execute instructions. For example, the logic subsystem may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, or otherwise arrive at a desired result.
The logic subsystem may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic subsystem may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. The processors of the logic subsystem may be single-core or multi-core, and the programs executed thereon may be configured for sequential, parallel or distributed processing. In some examples, logic subsystem may comprise a graphics processing unit (GPU). The logic subsystem may optionally include individual components that are distributed among two or more devices, which can be remotely located and/or configured for coordinated processing. Aspects of the logic subsystem may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.
Storage subsystem 804 includes one or more physical devices configured to hold data and/or instructions executable by the logic subsystem to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage subsystem 804 may be transformed—e.g., to hold different data.
Storage subsystem 804 may include removable computer-readable media and/or built-in computer readable media devices. Storage subsystem 804 may include optical memory devices (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory devices (e.g., RAM, EPROM, EEPROM, etc.) and/or magnetic memory devices (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. Storage subsystem 804 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.
It will be appreciated that storage subsystem 804 includes one or more physical devices and excludes propagating signals per se. However, in some embodiments, aspects of the instructions described herein may be propagated by a pure signal (e.g., an electromagnetic signal, an optical signal, etc.) via a communications medium, as opposed to being stored on a storage device comprising a computer readable storage medium. Furthermore, data and/or other forms of information pertaining to the present disclosure may be propagated by a pure signal.
In some embodiments, aspects of logic subsystem 802 and of storage subsystem 804 may be integrated together into one or more hardware-logic components through which the functionally described herein may be enacted. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC) systems, and complex programmable logic devices (CPLDs), for example.
When included, output subsystem 806 may be used to present a visual representation of data held by storage subsystem 804. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the storage subsystem, and thus transform the state of the storage subsystem, the state of output subsystem 806 may likewise be transformed to visually represent changes in the underlying data. Output subsystem 806 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic subsystem 802 and/or storage subsystem 804 in a shared enclosure, or such display devices may be peripheral display devices.
When included, input subsystem 808 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.
When included, communication subsystem 810 may be configured to communicatively couple computing system 800 with one or more other computing devices. Communication subsystem 810 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 800 to send and/or receive messages to and/or from other devices via a network such as the Internet.
It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof