The disclosure relates generally to computerized image processing, and more particularly to systems and methods for implementing sensor fusion techniques in computerized eye-tracking applications such as in head-mounted displays for virtual reality and/or augmented reality systems with improved features and characteristics.
One current generation of virtual reality (“VR”) experiences is created using head-mounted displays (“HMDs”), which can be tethered to a stationary computer (such as a personal computer (“PC”), laptop, or game console), combined and/or integrated with a smart phone and/or its associated display, or self-contained. VR experiences generally aim to be immersive and disconnect the users' senses from their surroundings.
Generally, HMDs are display devices, worn on the head of a user, that have a small display device in front of one (monocular HMD) or each eye (binocular HMD).
A binocular HMD has the potential to display a different image to each eye. This capability is used to display stereoscopic images.
The term “eye tracking” refers to the process of measuring either the point of gaze (i.e., where a person is looking), what the person is looking at, or the motion or position of a person's eye relative to that person's head. Various computerized eye tracking technologies have been implemented in HMDs and other applications, as ordinarily skilled artisans will readily recognize.
Eye trackers measure rotations of the eye in one of several ways. One broad category of eye tracking technology uses non-contact, optical methods for measuring eye location or gaze angle. For example, in one known class of optical eye tracking technology, light, typically in the infrared region, is reflected from the eye and sensed by a video camera. The information sensed by the video camera is then analyzed to extract gaze direction or location of pupil from changes in reflections. Video-based eye trackers sometimes use the corneal reflection or the center of the pupil as features to track over time.
In the context of HMD implementations, a camera-based eye tracking system may include a back-facing camera attached to the housing of the HMD and pointing a user's eye(s) (directly or indirectly) as a means to detect a user's eye position(s). The digital data generated by the camera is transmitted via wired or wireless means to an external device such as a computer (or alternatively, to computer resources located on the HMD itself) for processing and analysis. Computer software in such systems executes eye-tracking algorithms known to ordinarily skilled artisans to detect position of one or both of the user's eyes.
Certain HMD's that include eye-tracking capabilities contain either one or two small displays with lenses and semi-transparent (i.e., “hot”) mirrors embedded in many form factors, such as helmet, eyeglasses (also known as data glasses) or visor. The display units are typically miniaturized and may include CRT, LCD, Liquid crystal on silicon (LCos), or OLED technologies. Hot mirrors provide one possible design approach for eye tracking, and permit the camera or other eye-tracking sensors to get a good view of the eye being tracked. Certain hot mirrors reflect infrared (“IR”) radiation and are transparent to visible light. The hot mirror in certain eye-tracking HMD applications is tilted in front of the eye and allows the IR camera or other eye-tracking sensor to obtain a reflected image of the eye while the eye has a transparent view onto the display screen.
Such optical eye tracking methods are widely used for gaze tracking. Such trackers in certain implementations may require relatively high-resolution cameras capturing at a high frame rate with image processing and pattern recognizing devices to track the reflected light or known ocular structures such as the iris or the pupil. In order to be non-invasive and keep costs down, consumer-grade eye tracking solutions currently known in the art have substantial limitations in terms of performance that prevent the system from being capable of knowing precisely or with low latency the location of the subject's pupil and gaze direction to take full advantage in the case of foveated rendering, and costly high-resolution high-frame-rate cameras may provide only limited benefits.
However, certain currently commercially available and relatively inexpensive camera image-based eye trackers for HMD applications are difficult to run at high frequency and with sufficiently low latency, and they may produce results that are noisy and prone to occlusion in certain implementations. Although such systems may not necessarily be noisy because of low resolution or low frame rate, they may not sample at a sufficiently high rate to characterize the actual movement of the eye because they miss activity that takes place between samples or incorrectly determine beginning or end to saccades (rapid eye movements, discussed further below) and thus generate bad velocity and acceleration data causing error in predictions.
To begin to use prediction and also avoid missing saccades which would cause error in the results, which are important for VR, such systems must typically operate at least at a 240 Hz rate, due to the relatively high speed with which the human eye is known to move or change direction, especially with respect to what is known as saccadic motion. Saccadic motion refers to the unnoticed and sometimes involuntary motion of a person's eyes as they move between planes of focus.
Generally, saccades can be voluntary or involuntary. When a person redirects his or her gaze to look at something, that is a voluntary saccade. A person's eye is constantly performing involuntary micro-saccades which are virtually imperceptible. Micro-saccades can help to refresh the image and edges a person is viewing on the person's retina. If an image does not move on the retina, the rods/cones on the person's retina may become desensitized to the image and the person effectively becomes blind to it.
To detect and measure micro-saccades in an eye-tracking system generally requires a minimum sampling rate of 240 Hz. It is also not generally possible to determine eye motion precisely unless measurements can be performed well enough to decide whether gaze change is a micro-saccade and the gaze is already reverting back onto the object of focus, or whether the eye is instead accelerating away with a voluntary saccade. To improve performance, more frequent and accurate data is required.
Thus, currently available VR camera-based eye-tracking solutions typically do not perform with enough responsiveness, accuracy, or robustness to realize all the potential value of eye tracking for use in a consumer class HMD device. This is because increasing the frame rate and/or resolution of the eye-tracking camera is complex and expensive. Even if possible, such improvements typically generate more data, which increase bandwidth and thus make transmission more difficult and cause additional central processing unit (“CPU”) and/or graphics processing unit (“GPU”) load to calculate gaze direction. The extra load can either increase system cost or take limited computing time away from the application that is rendering on the display.
Another limitation is related to extreme eye angles, which may force the pupil or corneal reflections to go out of view of the camera in certain camera-based eye-tracking systems.
Eye-tracking solutions supplemented by relatively inexpensive and readily commercially available optical flow sensors are a possible improvement to camera-based systems. Generally, optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene. An optical flow sensor is a vision sensor capable of measuring optical flow or visual motion and outputting a measurement based on optical flow.
Optical flow sensors generally generate data pertaining to relative motion, as opposed to systems that provide data pertaining to relative position. The relative motion data may contain slight errors which over time cause drift as the errors accumulate. There are errors with relative position data as well, but they do not generally drift over time.
Various configurations of optical flow sensors exist. One configuration includes an image sensor chip connected to a processor programmed to run an optical flow algorithm. Another configuration uses a vision chip, which is an integrated circuit having both the image sensor and the processor on the same die, allowing for a compact implementation. An example of this is the type of sensor used extensively in computer optical mice.
Optical flow sensors are inexpensive, very precise, and can operate at a 1 kHz rate or higher. However, they typically exhibit low positional accuracy due to their known propensity to drift over time. So while they can provide good relative information on how far a mouse has traveled over a surface over short intervals of time, they cannot tell where the mouse is on the surface or where it is relative to its starting position because small errors accumulate causing large discrepancies. Combined with their low resolution and inability to “see” an entire user's eye or determine at any point where the eye is gazing, they cannot by themselves typically provide a sufficiently accurate position of the eye.
It is desirable to address the current limitations in this art.
By way of example, reference will now be made to the accompanying drawings, which are not to scale.
Those of ordinary skill in the art will realize that the following description of the present invention is illustrative only and not in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons, having the benefit of this disclosure, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein. Reference will now be made in detail to specific implementations of the present invention as illustrated in the accompanying drawings. The same reference numbers will be used throughout the drawings and the following description to refer to the same or like parts.
The data structures and code described in this detailed description are typically stored on a computer readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs) and DVDs (digital versatile discs or digital video discs), and computer instruction signals embodied in a transmission medium (with or without a carrier wave upon which the signals are modulated). For example, the transmission medium may include a communications network, such as the Internet.
In certain embodiments, memory 110 may include without limitation high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include without limitation non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 110 may optionally include one or more storage devices remotely located from the processor(s) 105. Memory 110, or one or more of the storage devices (e.g., one or more non-volatile storage devices) in memory 110, may include a computer readable storage medium. In certain embodiments, memory 110 or the computer readable storage medium of memory 110 may store one or more of the following programs, modules and data structures: an operating system that includes procedures for handling various basic system services and for performing hardware dependent tasks; a network communication module that is used for connecting computing device 110 to other computers via the one or more communication network interfaces and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on; a client application that may permit a user to interact with computing device 100.
Certain figures in this specification are flow charts illustrating methods and systems. It will be understood that each block of these flow charts, and combinations of blocks in these flow charts, may be implemented by computer program instructions. These computer program instructions may be loaded onto a computer or other programmable apparatus to produce a machine, such that the instructions which execute on the computer or other programmable apparatus create structures for implementing the functions specified in the flow chart block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction structures which implement the function specified in the flow chart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flow chart block or blocks.
Accordingly, blocks of the flow charts support combinations of structures for performing the specified functions and combinations of steps for performing the specified functions. It will also be understood that each block of the flow charts, and combinations of blocks in the flow charts, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
For example, any number of computer programming languages, such as C, C++, C# (CSharp), Perl, Ada, Python, Pascal, SmallTalk, FORTRAN, assembly language, and the like, may be used to implement aspects of the present invention. Further, various programming approaches such as procedural, object-oriented or artificial intelligence techniques may be employed, depending on the requirements of each particular implementation. Compiler programs and/or virtual machine programs executed by computer systems generally translate higher level programming languages to generate sets of machine instructions that may be executed by one or more processors to perform a programmed function or set of functions.
The term “machine-readable medium” should be understood to include any structure that participates in providing data which may be read by an element of a computer system. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks and other persistent memory such as devices based on flash memory (such as solid-state drives, or SSDs). Volatile media include dynamic random access memory (DRAM) and/or static random access memory (SRAM). Transmission media include cables, wires, and fibers, including the wires that comprise a system bus coupled to processor. Common forms of machine-readable media include, for example and without limitation, a floppy disk, a flexible disk, a hard disk, a magnetic tape, any other magnetic medium, a CD-ROM, a DVD, or any other optical medium.
Without limitation, head-mounted-displays (“HMD”) that may be used to implement aspects of certain embodiments of the present invention may be tethered to a stationary computer (such as a personal computer (“PC”), laptop, or game console), or alternatively may be self-contained (i.e., with some or all sensory inputs, controllers/computers, and outputs all housed in a single head-mounted device).
Aspects of the present invention in certain embodiments combine optical eye tracking that uses camera-based pupil and corneal reflection detection with optical flow hardware running at a higher frequency. This combination provides the accuracy that can be attained with the former and at the same time adds the desirable precision and latency characteristics of the latter during the periods between the camera-based samples, resulting in a higher performing overall system at a relatively reduced cost.
By augmenting a camera tracker with one or more optical flow sensors pointed at different targets on the visual field (e.g., different points on the surface of a user's eye, such as the iris or the sclera), one can perform sensor fusion to improve precision. By the same token, since the camera image provides an overall picture of eye position, that information can be used to cull occluded optical flow sensors, thus mitigating drift and errors due to blinking, eyelashes, and other structures or phenomena that interfere with the eye-tracking process.
Thus, adding optical flow sensors, which are relatively inexpensive due to their use in commodity mouse peripherals, helps to fill in the gaps temporally with a higher frequency input. They may also should extend tracking into periods where the camera-based tracking is not providing data, because of occlusion from the eyelid for example, and aid in providing redundant data source to improve the quality and validity of the camera-based data
There are many possible configurations to position the positional camera-based system and optical flow sensors.
The flow sensors are aimed through a narrow field of view and wide depth of field optical element in exemplary implementations. For example, the optics may be tuned to the vascular details in the sclera. Specifically, if the area observed by a sensor is too small, there may not be enough vascular detail in view. On the other hand, if the area is too large, it may be difficult or impossible to resolve the details, and the user's eyelid may be in view too much of the time, which may impair the quality and value of detected data. In certain embodiments, optical flow sensors may be intentionally aimed at a user's eyelid, so as to assist with blink detection and with detecting when sensors aimed at the user's iris and/or sclera are observing eyelid movement, as opposed to eye rotation.
In certain embodiments, the optical flow sensors can be bounced off the same hot mirror that the image camera uses. In other embodiments, a wave guide is located in front of the lens to facilitate imaging of each of the user's eye. Since the human eye moves around quite a bit, and eyelids can interfere with optical flow during blinks or as they move with the eye, certain embodiments utilize a plurality of optical flow sensors running simultaneously, each pointing at different parts of the eye. The number of sensors depends on the particular requirement of each implementation, and is based on considerations of cost and performance.
The sensors that need to be squelched from sample to sample may be determined by the low-frequency camera-based image tracking component, since the camera image provides an overall picture of eye position, and that information can be used to cull occluded optical flow sensors Information from other optical flow sensors in the system may also be used for this squelching function. Information from the optical flow sensors can also be used to help identify blinks to help improve the validity of camera-based sample data.
As shown in
Thus, due to the ability of the hot mirror to reflect infrared light, the eye-tracking sensors (325, 335) detect a reflected view of the eye.
In certain embodiments, the eye-tracking camera subsystem (325) operates in the infrared optical frequency range. In certain further embodiments, the eye-tracking apparatus 300 according to aspects of the present invention also comprises a noise squelching system that determines a subset of said one or more optical flow sensors to ignore at any given time based on the camera-based eye position estimate from the eye-tracking camera subsystem.
Depending on the particular requirements of each implementation, the eye-tracking camera subsystem and the array of optical flow sensors may be housed within a head-mounted display.
Thus, sensor fusion techniques according to aspects of the present invention enables the combination of two complementary tracking systems into a system that has the advantages of both to have high-frame-rate, low-latency, accurate eye-tracking at relatively low cost. Whereas certain existing camera-based eye-tracking systems provide regular absolute positioning information for the pupil position, they may not provide this information as often as is necessary for certain applications that could use eye-tracking. On the other hand, optical flow sensors can generate relative data at relatively high data rates, but they may provide inaccurate positional data. Sensor fusion techniques according to aspects of the present invention allows a system to combine the positional accuracy of the slow system with the relative data of the fast system to obtain the best of both worlds and provide accurate data at very low latency.
Aspects of the present invention may be implemented in certain embodiment using a field-programmable gate arrays (“FPGAs”) and microcontrollers. In such embodiments, one or more microcontrollers manage the high-speed FPGA front-end and package the data stream for delivery back to a host computer over a suitable interface bus (e.g. USB) for further processing.
In the foregoing descriptions, certain embodiments are described in terms of particular data structures, preferred and optional enforcements, preferred control flows, and examples. Other and further application of the described methods, as would be understood after review of this application by those with ordinary skill in the art, are within the scope of the invention.
While the above description contains many specifics and certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art, as mentioned above. The invention includes any combination or sub-combination of the elements from the different species and/or embodiments disclosed herein.