The present invention relates generally to systems and methods for eye tracking that are implemented for gaze determination, e.g., determining locations in space or object(s) being viewed by one or both eyes. In particular, the gaze-determination systems and methods herein may enable point-of-gaze determination in a wearable device without the need for head-tracking after calibration.
This systems and methods herein relate to gaze tracking using a wearable eye-tracking device that utilizes head-pose estimation to improve gaze accuracy. The use of head-tracking allows the system to know the user's head position in relation to the monitor. This enables the user to accurately interact with an electronic display or other monitor (e.g., control a pointer) using his/her gaze.
Many wearable eye-tracking devices do not include head pose estimation. However, minor shifts in head pose can introduce ambiguity in eye trackers that use the eye visual axis only when determining the gaze vector. Knowledge of the head pose can extend the range of accuracy of a gaze-tracking system.
The present invention is directed to systems and methods for eye tracking that are implemented for gaze determination, e.g., determining locations in space or object(s) being viewed by one or both eyes. In particular, the gaze-determination systems and methods herein may enable point-of-gaze determination in a wearable device without the need for head-tracking after calibration.
In accordance with an exemplary embodiment, a method is provided for eye tracking that includes one or more steps, such as calibrating a wearable device before the wearable device is worn by a user; placing the wearable device on a user's head adjacent one or both of the user's eyes; calibrating the wearable device after placing the wearable device on the user's head; detecting at least one eye feature of a first eye of the user's eyes; performing a compensation algorithm; and calculating a gaze direction of the user.
In accordance with another embodiment, a system is provided for eye tracking that includes a wearable device configured to be worn on a user's head; an exo-camera on the wearable device configured to provide images of a user's surroundings when the wearable device is worn by the user; an endo-camera on the wearable device configured to provide images of a first eye of the user when the wearable device is worn by the user; and one or more processors configured for one or more of calibrating the wearable device before the wearable device is worn by a user; calibrating the wearable device after placing the wearable device on the user's head; detecting at least one eye feature of a first eye of the user's eyes; performing a compensation algorithm; and calculating a gaze direction of the user.
In accordance with still another embodiment, a method is provided for compensating for movement of a wearable eye tracking device relative to a user's eye that includes wearing a wearable device on a user's head such that one or more endo-cameras are positioned to acquire images of one or both of the user's eyes, and an exo-camera is positioned to acquire images of the user's surroundings; calculating the location of features in a user's eye that cannot be directly observed from images of the eye acquired by an endo-camera; and spatially transforming camera coordinate systems of the exo- and endo-cameras to place calculated eye features in a known location and alignment.
Other aspects and features of the present invention will become apparent from consideration of the following description taken in conjunction with the accompanying drawings.
The invention is best understood from the following detailed description when read in conjunction with the accompanying drawings. It will be appreciated that the exemplary apparatus shown in the drawings are not necessarily drawn to scale, with emphasis instead being placed on illustrating the various aspects and features of the illustrated embodiments.
The present invention may provide apparatus, systems, and methods for head tracking and gaze tracking that include one or more of the following features:
One of the hurdles to accurate gaze-mapping in a mobile wearable eye-tracking device is finding a user-friendly method to determine head pose information. In many cases, a user is comfortable with a short user-specific calibration. The main advantage of the gaze determination method disclosed herein is that point-of-regard may be maintained with or without head tracking after calibration. This is accomplished by estimating the point in space where the user is looking and projecting it onto the scene image. This allows for gaze determination in a plethora of environments not restricted to a computer desk.
Turning to the drawings,
Turning to
Illumination Source Calibration Step: The first step in calibrating the glint locations in endo-camera images with the light source locations on the wearable device is to acquire a set of perspective images with a secondary reflective surface and light source(s). For example, images of a mirror placed near the working distance of the camera may be acquired, where the mirror's edges are surrounded by LEDs and the mirror is placed in front of the camera such that the glint-LEDs may be seen in the image. The second step is to use a software program to mark and extract the positions of the light sources surrounding the mirror and the reflections in the mirror of the glint-generating light sources on the wearable device. The next step is to determine the homography between the image and the plane of the reflective surface. The aforementioned homography is subsequently applied to the glint light source in the image plane to get the three-dimensional (3D) point corresponding to the light source on the reflective surface. With the 3D locations in space, the ray originating at the light source that generated the glint on the reflective surface may be determined. These steps are repeated for each of the perspective images. The intersection of the calculated ray vectors is determined for each glint source for each perspective image acquired.
Camera-Camera Calibration Step: Two standard checkerboards, and/or other known geometric pattern, are positioned such that one pattern substantially fills the field of view of each of the exo-camera 20 and the endo-camera(s) 30, e.g., positioned at a near optimal working distance of the respective camera, i.e., the object is at near best focus. The position of the checkerboards remains substantially fixed during camera-to-camera calibration. The wearable device is moved between the patterns, while several sets of images with the patterns in full view are acquired from both the endo-camera(s) 30 and exo-camera 20 (eye and scene camera, respectively). Each set of images yields a set of 3 equations. Multiple sets of images yield an overdetermined matrix of the form Ax=B. The matrix equation may be solved with SVD to get the camera-to-camera transformation.
In addition, the calibration step 110 may then include a User-Specific Calibration. In this step, codes displayed on the monitor in the exo-camera images are registered with an established monitor plane. This provides an estimate of head-pose at each calibration and test point in the user's calibration session. The codes may come in the form of a variety of patterns comprising contrasting geometric phenomenon. The patterns may be displayed on the monitor, constructed of other materials and attached to the monitor, a series of light sources in pattern around the monitor, and the like. Additionally, head pose may be estimated using an accelerometer, MEMS device, or other orientation sensor. In the past, accelerometers were bulky, but have significantly been reduced in their overall footprint with the incorporation of MEMS technology.
In an exemplary embodiment, user-specific calibration may be performed with mapping techniques, wherein mapping refers to a mathematical function. The function takes as a variable raw data and evaluates to calibrated points. For example, a polynomial fit is applied to an entire space, and an output value for any point within that space is determined by the function.
In another exemplary embodiment, user-specific calibration may be performed with interpolation. While mapping covers an entire space of interest, interpolation is performed in a piecewise fashion on specific subregions and localized data. For example, the entire space may be subdivided into four subregions, and linear fits may be applied to each of those subregions by using a weighted average of the corner points of each region. If the number of subregions is increased, the interpolation approaches the polynomial fit of the prior exemplary embodiment.
In another exemplary embodiment, user-specific calibration may be performed with machine learning. While machine learning may appear to behave like mathematical functions as applied to the mapping method, machine learning techniques may internally represent highly irregular mappings that would otherwise require extremely complex mathematical equations like discontinuous functions and high-order polynomials. Machine learning techniques also make no assumptions about the types of equations they will model, meaning that the training procedure is identical regardless of the type of mapping it will ultimately represent. This eliminates, among other things, the need for the author to understand the relationship between inputs and outputs. They may also execute very quickly making them useful in high performance applications.
Head-Pose Estimation Step: In the head-pose estimation step 112 in
The following steps may occur during and/or after the user calibration step:
Pupil Detection Step: An exemplary embodiment of the pupil detection step 114 in
Glint Detection Step: For the glint detection step 116 of
Cornea Center Calculation Step: Next, the normalization step 118 of
Once the gaze vector in the endo-camera coordinate system is obtained, it may be mapped to either a point-of-regard (POR) overlay, or the monitor plane if mouse or pointer control is required. In the case of two dimensional (2D) POR and pointer control, head pose continues to be calculated. In either scenario, accurate gaze determination with unrestricted head movement may be accomplished through proper normalization and denormalization of the endo- (toward the eye) and exo-camera spaces(outward-looking) relative to a virtual plane. The image pupil point is projected onto the virtual plane (the mapping between the endo-,exo-, and virtual coordinate spaces is determined during calibration). Then the gaze point is found by intersection of the line formed by the cornea and virtual plane point with the monitor.
Because the frames are not fixed to the person, the user could move the frames while still looking at the same spot on the virtual plane. A processor analyzing the endo-camera images would detect that the center of the eye moved and project it to a different spot on the virtual plane. To rectify this problem, the center of the pupil is normalized. The cornea center is used as a reference point and every frame it is transformed to a specific, predetermined position. The normalized pupil position is then determined on the shifted cornea image.
Essentially, the normalization puts the cornea in the same position in every frame of the endo-camera images, e.g., within an x-y-z reference frame. First, the normalization includes a rotation that will rotate the cornea about the origin and put it on the z-axis. This rotation is determined by restricting the rotation to a combination of rotation around the x-axis followed by rotation around the y-axis. The translation is determined by calculating the required translation to move the rotated cornea to a predetermined value on the z-axis. Because of the rotation done before translation, the translation only contains a z value.
To determine the pupil position on the cornea, the pupil position on the image plane in 3D is retrieved and then the intersection of the line formed by the pupil on the image and origin with the non-normalized cornea is found. That point on the cornea is then normalized along with the cornea.
Once the cornea and pupil are normalized on the cornea, the next step is to determine the normalized pupil on the image plane, e.g., at the normalization step 118 shown in
Normalization puts the cornea in a specific position in the endo-camera coordinate system. Since the cornea does not move relative to the screen, the screen moves as well. They are both fixed in space for the instance of this frame. The cameras and virtual plane are all fixed together as well as the frames. So when normalization moves the cornea into the specific position in the endo-camera coordinate system, it is functionally the same as the cornea remaining still and the coordinate system moving. The new normalized pupil center is projected onto the virtual plane but because the virtual plane moved with the endo coordinate system, the gaze point right now would be wrong. The virtual plane must now be denormalized to return it to the proper position for the gaze point, e.g., as shown in
Normalization Step:
For practical implementation, a mobile gaze-determination system must be robust to small shifts in frame position relative to the face for a given user in addition to accommodating unrestricted head movement. Both conditions may be accomplished through proper normalization of the endo-(toward the eye) and exo-spaces(outward-looking) relative to the viewing plane.
For 3D POR, gaze point is determined by convergence of the left and right eye gaze vectors. The information may then be relayed to the user through the mobile device as an overlay on the ex-camera (scene) video images.
Point-of-Regard Step: Next, at step 124 of
When the point of gaze data is integrated into a more elaborate user interface with cursor control, eye movements may be used interchangeably with other input devices, e.g., that utilize hands, feet, and/or other body movements to direct computer and other control applications.
It will be appreciated that elements or components shown with any embodiment herein are exemplary for the specific embodiment and may be used on or in combination with other embodiments disclosed herein.
While the invention is susceptible to various modifications, and alternative forms, specific examples thereof have been shown in the drawings and are herein described in detail. It should be understood, however, that the invention is not to be limited to the particular forms or methods disclosed, but to the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the scope of the appended claims.
This application claims benefit of co-pending provisional applications Ser. Nos. 61/734,354, 61/734,294, and 61/734,342, all filed Dec. 6, 2012. This application is also related to applications Ser. Nos. 12/715,177, filed Mar. 1, 2010, 13/290,948, filed Nov. 7, 2011, and U.S. Pat. No. 6,541,081. The entire disclosures of these references are expressly incorporated by reference herein.
The U.S. Government may have a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Department of Defense (US Army) Contract No. W81XWH-05-C-0045, U.S. Department of Defense Congressional Research Initiatives No. W81XWH-06-2-0037, W81XWH-09-2-0141, and W81XWH-11-2-0156; and U.S. Department of Transportation Congressional Research Initiative Agreement Award No. DTNH 22-05-H-01424.
Number | Date | Country | |
---|---|---|---|
61734354 | Dec 2012 | US | |
61734294 | Dec 2012 | US | |
61734342 | Dec 2012 | US |