SYSTEMS AND METHODS FOR EYE GAZE DETERMINATION

Abstract
Devices and methods are provided for eye and gaze tracking determination. In one embodiment, a method for compensating for movement of a wearable eye tracking device relative to a user's eye is provided that includes wearing a wearable device on a user's head such that one or more endo-cameras are positioned to acquire images of one or both of the user's eyes, and an exo-camera is positioned to acquire images of the user's surroundings; calculating the location of features in a user's eye that cannot be directly observed from images of the eye acquired by an endo-camera; and spatially transforming camera coordinate systems of the exo- and endo-cameras to place calculated eye features in a known location and alignment.
Description
FIELD OF THE INVENTION

The present invention relates generally to systems and methods for eye tracking that are implemented for gaze determination, e.g., determining locations in space or object(s) being viewed by one or both eyes. In particular, the gaze-determination systems and methods herein may enable point-of-gaze determination in a wearable device without the need for head-tracking after calibration.


BACKGROUND

This systems and methods herein relate to gaze tracking using a wearable eye-tracking device that utilizes head-pose estimation to improve gaze accuracy. The use of head-tracking allows the system to know the user's head position in relation to the monitor. This enables the user to accurately interact with an electronic display or other monitor (e.g., control a pointer) using his/her gaze.


Many wearable eye-tracking devices do not include head pose estimation. However, minor shifts in head pose can introduce ambiguity in eye trackers that use the eye visual axis only when determining the gaze vector. Knowledge of the head pose can extend the range of accuracy of a gaze-tracking system.


SUMMARY

The present invention is directed to systems and methods for eye tracking that are implemented for gaze determination, e.g., determining locations in space or object(s) being viewed by one or both eyes. In particular, the gaze-determination systems and methods herein may enable point-of-gaze determination in a wearable device without the need for head-tracking after calibration.


In accordance with an exemplary embodiment, a method is provided for eye tracking that includes one or more steps, such as calibrating a wearable device before the wearable device is worn by a user; placing the wearable device on a user's head adjacent one or both of the user's eyes; calibrating the wearable device after placing the wearable device on the user's head; detecting at least one eye feature of a first eye of the user's eyes; performing a compensation algorithm; and calculating a gaze direction of the user.


In accordance with another embodiment, a system is provided for eye tracking that includes a wearable device configured to be worn on a user's head; an exo-camera on the wearable device configured to provide images of a user's surroundings when the wearable device is worn by the user; an endo-camera on the wearable device configured to provide images of a first eye of the user when the wearable device is worn by the user; and one or more processors configured for one or more of calibrating the wearable device before the wearable device is worn by a user; calibrating the wearable device after placing the wearable device on the user's head; detecting at least one eye feature of a first eye of the user's eyes; performing a compensation algorithm; and calculating a gaze direction of the user.


In accordance with still another embodiment, a method is provided for compensating for movement of a wearable eye tracking device relative to a user's eye that includes wearing a wearable device on a user's head such that one or more endo-cameras are positioned to acquire images of one or both of the user's eyes, and an exo-camera is positioned to acquire images of the user's surroundings; calculating the location of features in a user's eye that cannot be directly observed from images of the eye acquired by an endo-camera; and spatially transforming camera coordinate systems of the exo- and endo-cameras to place calculated eye features in a known location and alignment.


Other aspects and features of the present invention will become apparent from consideration of the following description taken in conjunction with the accompanying drawings.





BRIEF DESCRIPTION OF THE DRAWINGS

The invention is best understood from the following detailed description when read in conjunction with the accompanying drawings. It will be appreciated that the exemplary apparatus shown in the drawings are not necessarily drawn to scale, with emphasis instead being placed on illustrating the various aspects and features of the illustrated embodiments.



FIGS. 1A and 1B are perspective and back views, respectively, of an exemplary embodiment of a wearable gaze tracking device.



FIG. 2 is a flowchart showing an exemplary method for gaze tracking using a wearable device, such as that shown in FIGS. 1A and 1B.



FIG. 3 is a flowchart showing an exemplary method for gaze mapping that may be included in the method shown in FIG. 2.



FIG. 4 is a flowchart showing an exemplary method for pupil detection that may be included in the method shown in FIG. 2.



FIGS. 5 and 6 are schematic representations showing a projected pupil point on a virtual plane after normalization and denormalization using a method, such as that shown in FIG. 2.





DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

The present invention may provide apparatus, systems, and methods for head tracking and gaze tracking that include one or more of the following features:

    • gaze tracking in a system that allows unrestricted movement of the head;
    • gaze tracking in a system that is robust to small shifts in frame position relative to the face for a given user;
    • gaze point registration with scene image with or without head tracking;
    • the storage of the user's calibration data for use with a single headset at a later time


One of the hurdles to accurate gaze-mapping in a mobile wearable eye-tracking device is finding a user-friendly method to determine head pose information. In many cases, a user is comfortable with a short user-specific calibration. The main advantage of the gaze determination method disclosed herein is that point-of-regard may be maintained with or without head tracking after calibration. This is accomplished by estimating the point in space where the user is looking and projecting it onto the scene image. This allows for gaze determination in a plethora of environments not restricted to a computer desk.


Turning to the drawings, FIGS. 1A and 1B show an exemplary embodiment of a wearable gaze-tracking device 10 that includes a wearable device 12, e.g., a frame for glasses (as shown), or a mask, a headset, a helmet, and the like that is configured to be worn on a users head (not shown), an exo-camera 20 (mounted on the device to image the user's surroundings), one or more endo-cameras 30 (mounted on the device to image one or more both of the user's eyes). In addition, the device 10 may include one or more light sources, processors, memory, and the like (not shown) coupled to other components for operating the device 10 and/or performing the various functions described herein. Exemplary components, e.g., wearable devices, cameras, light sources, processors, communication interfaces, and the like, that may be included in the device 10 are disclosed in U.S. Pat. Nos. 6,541,081 and 7,488,294, and U.S. Publication Nos. 2011/0211056 and 2013/0114850, the entire disclosures of which are expressly incorporated by reference herein.


Turning to FIG. 2, an exemplary method for gaze mapping and determination is shown. Although the steps are shown in an exemplary sequence, the steps may optionally be formed in a different order than that shown. Generally, the method includes a) a calibration step 110 in which the wearable device (e.g., device 10) is calibrated, a marker detection step 112, a pupil detection step 114, a glint detection step 116, a normalization step 118, a user calibration step 120, a gaze mapping step 122, and a three-dimensional (3D) point-of-regard (POR) step 124. In step 112, head pose estimation, typically operates substantially continuously. Once the user has placed the device upon their head or face, gaze determination (steps 114-124), including user calibration step 120, generally begins with i) pupil detection 114, and ii) glint location (identifying glints reflected off of one or more both eyes acquired by the endo-camera(s) 30), where i) and ii) may also be performed in reverse order (glint detection before pupil detection). The camera-to-camera calibration steps 110 (calibrating the endo-camera(s) 30 and exo-camera 20) is generally performed prior to the user placing the wearable device on their face, e.g., as described below.


Illumination Source Calibration Step: The first step in calibrating the glint locations in endo-camera images with the light source locations on the wearable device is to acquire a set of perspective images with a secondary reflective surface and light source(s). For example, images of a mirror placed near the working distance of the camera may be acquired, where the mirror's edges are surrounded by LEDs and the mirror is placed in front of the camera such that the glint-LEDs may be seen in the image. The second step is to use a software program to mark and extract the positions of the light sources surrounding the mirror and the reflections in the mirror of the glint-generating light sources on the wearable device. The next step is to determine the homography between the image and the plane of the reflective surface. The aforementioned homography is subsequently applied to the glint light source in the image plane to get the three-dimensional (3D) point corresponding to the light source on the reflective surface. With the 3D locations in space, the ray originating at the light source that generated the glint on the reflective surface may be determined. These steps are repeated for each of the perspective images. The intersection of the calculated ray vectors is determined for each glint source for each perspective image acquired.


Camera-Camera Calibration Step: Two standard checkerboards, and/or other known geometric pattern, are positioned such that one pattern substantially fills the field of view of each of the exo-camera 20 and the endo-camera(s) 30, e.g., positioned at a near optimal working distance of the respective camera, i.e., the object is at near best focus. The position of the checkerboards remains substantially fixed during camera-to-camera calibration. The wearable device is moved between the patterns, while several sets of images with the patterns in full view are acquired from both the endo-camera(s) 30 and exo-camera 20 (eye and scene camera, respectively). Each set of images yields a set of 3 equations. Multiple sets of images yield an overdetermined matrix of the form Ax=B. The matrix equation may be solved with SVD to get the camera-to-camera transformation.


In addition, the calibration step 110 may then include a User-Specific Calibration. In this step, codes displayed on the monitor in the exo-camera images are registered with an established monitor plane. This provides an estimate of head-pose at each calibration and test point in the user's calibration session. The codes may come in the form of a variety of patterns comprising contrasting geometric phenomenon. The patterns may be displayed on the monitor, constructed of other materials and attached to the monitor, a series of light sources in pattern around the monitor, and the like. Additionally, head pose may be estimated using an accelerometer, MEMS device, or other orientation sensor. In the past, accelerometers were bulky, but have significantly been reduced in their overall footprint with the incorporation of MEMS technology.


In an exemplary embodiment, user-specific calibration may be performed with mapping techniques, wherein mapping refers to a mathematical function. The function takes as a variable raw data and evaluates to calibrated points. For example, a polynomial fit is applied to an entire space, and an output value for any point within that space is determined by the function.


In another exemplary embodiment, user-specific calibration may be performed with interpolation. While mapping covers an entire space of interest, interpolation is performed in a piecewise fashion on specific subregions and localized data. For example, the entire space may be subdivided into four subregions, and linear fits may be applied to each of those subregions by using a weighted average of the corner points of each region. If the number of subregions is increased, the interpolation approaches the polynomial fit of the prior exemplary embodiment.


In another exemplary embodiment, user-specific calibration may be performed with machine learning. While machine learning may appear to behave like mathematical functions as applied to the mapping method, machine learning techniques may internally represent highly irregular mappings that would otherwise require extremely complex mathematical equations like discontinuous functions and high-order polynomials. Machine learning techniques also make no assumptions about the types of equations they will model, meaning that the training procedure is identical regardless of the type of mapping it will ultimately represent. This eliminates, among other things, the need for the author to understand the relationship between inputs and outputs. They may also execute very quickly making them useful in high performance applications.


Head-Pose Estimation Step: In the head-pose estimation step 112 in FIG. 2, each eye image in the video sequence is first pre-processed and has a threshold applied to acquire marker candidates as contours. The candidates are evaluated for contour size, roundness, and corner count. The final candidates are extracted and matched to marker codes stored within the system directories. The user's head pose and orientation are calculated relative to the marker corners.


The following steps may occur during and/or after the user calibration step:


Pupil Detection Step: An exemplary embodiment of the pupil detection step 114 in FIG. 2 is shown in FIG. 4. One potential method for pupil detection is to first apply a blob detector, e.g., MSER, to a downsized and thresholded image to identify regions similar in features to a pupil from endo-camera images. The blob detector may, for example, be constrained to find circularity (e.g., eccentricity, low order moments, and the like) and stable regions that resemble a pupil. After a suitable region of interest is identified, an algorithm such as Dense Stage I Starburst may be applied to find pupil edges, while ignoring glints. Finally, an ellipse is fitted to the pupil edge, for example using methods such as Ransac or Hough transforms. Exemplary methods are disclosed in Chinese Publication No. CN102831610 and U.S. Pat. No. 7,110,568, the entire disclosures of which are expressly incorporated by reference herein.


Glint Detection Step: For the glint detection step 116 of FIG. 2, in an exemplary embodiment, first, an adaptive threshold is applied to a subwindow of the full resolution image determined, where the threshold value is based on mean and median intensity of iris. The image contrast is enhanced. Then, the glints are segmented out of the image through a combination of the threshold and edge detection. Dilation and erosion filters are applied to segmented glints to remove noise. The contours of the glint candidates are determined. The aforementioned glint candidates are screened for predetermined parameters of the actual glint, e.g., size constraints, oddly shaped, eccentricity, and the like. The actual glints are selected from a final pool of candidates based on geometric constraints.


Cornea Center Calculation Step: Next, the normalization step 118 of FIG. 2 may be performed. The location of the light sources on the device 10 that produce the glints reflected at the anterior corneal surface and the eye tracking camera intrinsic parameters are known. The cornea is assumed to be substantially spherical. Each glint establishes a trajectory of possible cornea center positions in three dimensional (3D) space. Each trajectory pair generates a 3D location on which the cornea center resides. The corneal center coordinates are calculated using the aforementioned information together with a default corneal radius of curvature that matches the population average.


Once the gaze vector in the endo-camera coordinate system is obtained, it may be mapped to either a point-of-regard (POR) overlay, or the monitor plane if mouse or pointer control is required. In the case of two dimensional (2D) POR and pointer control, head pose continues to be calculated. In either scenario, accurate gaze determination with unrestricted head movement may be accomplished through proper normalization and denormalization of the endo- (toward the eye) and exo-camera spaces(outward-looking) relative to a virtual plane. The image pupil point is projected onto the virtual plane (the mapping between the endo-,exo-, and virtual coordinate spaces is determined during calibration). Then the gaze point is found by intersection of the line formed by the cornea and virtual plane point with the monitor.


Because the frames are not fixed to the person, the user could move the frames while still looking at the same spot on the virtual plane. A processor analyzing the endo-camera images would detect that the center of the eye moved and project it to a different spot on the virtual plane. To rectify this problem, the center of the pupil is normalized. The cornea center is used as a reference point and every frame it is transformed to a specific, predetermined position. The normalized pupil position is then determined on the shifted cornea image.


Essentially, the normalization puts the cornea in the same position in every frame of the endo-camera images, e.g., within an x-y-z reference frame. First, the normalization includes a rotation that will rotate the cornea about the origin and put it on the z-axis. This rotation is determined by restricting the rotation to a combination of rotation around the x-axis followed by rotation around the y-axis. The translation is determined by calculating the required translation to move the rotated cornea to a predetermined value on the z-axis. Because of the rotation done before translation, the translation only contains a z value. FIG. 5 shows how the pupil position may be found on the cornea.


To determine the pupil position on the cornea, the pupil position on the image plane in 3D is retrieved and then the intersection of the line formed by the pupil on the image and origin with the non-normalized cornea is found. That point on the cornea is then normalized along with the cornea.


Once the cornea and pupil are normalized on the cornea, the next step is to determine the normalized pupil on the image plane, e.g., at the normalization step 118 shown in FIG. 2. This is the intersection of the line formed by the normalized pupil on the cornea and origin with the image plane. FIG. 5 demonstrates this.


Normalization puts the cornea in a specific position in the endo-camera coordinate system. Since the cornea does not move relative to the screen, the screen moves as well. They are both fixed in space for the instance of this frame. The cameras and virtual plane are all fixed together as well as the frames. So when normalization moves the cornea into the specific position in the endo-camera coordinate system, it is functionally the same as the cornea remaining still and the coordinate system moving. The new normalized pupil center is projected onto the virtual plane but because the virtual plane moved with the endo coordinate system, the gaze point right now would be wrong. The virtual plane must now be denormalized to return it to the proper position for the gaze point, e.g., as shown in FIG. 6.


Normalization Step: FIG. 3 shows an exemplary method for performing the normalization step 118 shown in FIG. 2. The cornea center is rotated about the origin to lie on the z-axis in the endo-camera coordinate system (eye camera coordinate system). Rotation is performed about the x-axis first, then the y-axis. The rotated cornea position is translated to a constant, predefined position along the z-axis. The next step is to transform pupil center data from image pixels to image plane in units of millimeters. Now the point where the line intersecting the endo-camera center and the pupil center on the image plane intersects with the cornea may be determined. The cornea is assumed to be a sphere with a radius centered at the normalized cornea center position. The intersection point is endo-normalized and scaled such that it lies on the image plane, and transformed back into pixels. The normalized pupil is then projected onto a virtual plane, where the polynomial projection function is user-dependent and generated during user calibration. The display origin and normal vector are transformed to the exo-camera coordinate system (scene camera coordinate system). The next step is to transform the cornea center to exo-camera coordinates, followed by transforming the endo-normalization into the exo-camera coordinate system to obtain exo-normalization transformation. The inverse of the exo-normalization transformation is applied to the projected normalized pupil point in the exo-camera coordinate system, e.g., as shown in FIG. 6. The intersection of the line (exo-cornea and de-normalized projected normalized pupil) with the exo-screen plane is determined. The final step is to transform the result of that intersection to the screen coordinate system of the monitor, and then to pixel to obtain gaze point on the monitor.


For practical implementation, a mobile gaze-determination system must be robust to small shifts in frame position relative to the face for a given user in addition to accommodating unrestricted head movement. Both conditions may be accomplished through proper normalization of the endo-(toward the eye) and exo-spaces(outward-looking) relative to the viewing plane.


For 3D POR, gaze point is determined by convergence of the left and right eye gaze vectors. The information may then be relayed to the user through the mobile device as an overlay on the ex-camera (scene) video images.


Point-of-Regard Step: Next, at step 124 of FIG. 2, a 3D POR overlap may be performed. The left gaze line is defined by de-normalized projected normalized pupil and cornea in exo-camera coordinate system for left eye. The same procedure is applied to right eye. The intersection (or closest point of intersection) between the two lines is determined and then projected onto the exo-camera images.


When the point of gaze data is integrated into a more elaborate user interface with cursor control, eye movements may be used interchangeably with other input devices, e.g., that utilize hands, feet, and/or other body movements to direct computer and other control applications.


It will be appreciated that elements or components shown with any embodiment herein are exemplary for the specific embodiment and may be used on or in combination with other embodiments disclosed herein.


While the invention is susceptible to various modifications, and alternative forms, specific examples thereof have been shown in the drawings and are herein described in detail. It should be understood, however, that the invention is not to be limited to the particular forms or methods disclosed, but to the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the scope of the appended claims.

Claims
  • 1. A method for eye tracking, comprising: a) calibrating a wearable device before the wearable device is worn by a user;b) placing the wearable device on a user's head adjacent one or both of the user's eyes;c) calibrating the wearable device after placing the wearable device on the user's head;d) detecting at least one eye feature of a first eye of the user's eyes;e) performing a compensation algorithm; andf) calculating a gaze direction of the user.
  • 2. The method of claim 1, wherein step c) includes at least one of: i) identifying one or more glints reflected off one or both eyes of the user; andii) calibrating between an endo-camera configured to acquire images of one eye of the user and an exo-camera configured to acquire images of the user's surroundings.
  • 3. The method of claim 1, wherein step a) comprises computer vision methods.
  • 4. The method of claim 2, wherein step a) is completed after manufacturing the wearable device and before first use of the wearable device.
  • 5. The method of claim 1, wherein step c) comprises at least one of estimating a head pose of the user wearing the wearable device.
  • 6. The method of claim 1, wherein step e) comprises at least one of normalization, denormalization, and spatial transform to correct for movement between the eye and the eye tracking camera.
  • 7. The method of claim 1, wherein step f) comprises calculating a target region within a real or virtual surface or volume, which includes at least one of construction of a vector in space, mapping, and interpolation.
  • 8. A system for eye tracking, comprising: a wearable device configured to be worn on a user's head;an exo-camera on the wearable device configured to provide images of a user's surroundings when the wearable device is worn by the user;an endo-camera on the wearable device configured to provide images of a first eye of the user when the wearable device is worn by the user; andone or more processors configured for:a) calibrating a wearable device before the wearable device is worn by a user;b) calibrating the wearable device after placing the wearable device on the user's head;c) detecting at least one eye feature of a first eye of the user's eyes;d) performing a compensation algorithm; ande) calculating a gaze direction of the user.
  • 9. A method for compensating for movement of a wearable eye tracking device relative to a user's eye, comprising: wearing a wearable device on a user's head such that one or more endo-cameras are positioned to acquire images of one or both of the user's eyes, and an exo-camera is positioned to acquire images of the user's surroundings;calculating the location of features in a user's eye that cannot be directly observed from images of the eye acquired by an endo-camera; andspatially transforming camera coordinate systems of the exo- and endo-cameras to place calculated eye features in a known location and alignment.
RELATED APPLICATION DATA

This application claims benefit of co-pending provisional applications Ser. Nos. 61/734,354, 61/734,294, and 61/734,342, all filed Dec. 6, 2012. This application is also related to applications Ser. Nos. 12/715,177, filed Mar. 1, 2010, 13/290,948, filed Nov. 7, 2011, and U.S. Pat. No. 6,541,081. The entire disclosures of these references are expressly incorporated by reference herein.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

The U.S. Government may have a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Department of Defense (US Army) Contract No. W81XWH-05-C-0045, U.S. Department of Defense Congressional Research Initiatives No. W81XWH-06-2-0037, W81XWH-09-2-0141, and W81XWH-11-2-0156; and U.S. Department of Transportation Congressional Research Initiative Agreement Award No. DTNH 22-05-H-01424.

Provisional Applications (3)
Number Date Country
61734354 Dec 2012 US
61734294 Dec 2012 US
61734342 Dec 2012 US