The present invention relates, generally, to eye-tracking systems and methods and, more particularly, to the processing of image data produced via such systems.
Eye-tracking systems—such as those used in conjunction with desktop computers, laptops, tablets, virtual reality headsets, and other computing devices that include a display—generally include one or more illuminators configured to direct infrared light to the user's eyes and an image sensor that captures the images for further processing. By determining the relative locations of the user's pupils and the corneal reflections produced by the illuminators, the eye-tracking system can accurately predict the user's gaze point on the display.
The processing of image data frames typically requires significant processing power on the edge computing device itself, which tends to increase the cost of device hardware. While in some contexts it might be advantageous to perform image processing using cloud computing, the network capabilities (e.g., latency and bandwidth) can reduce the responsiveness and overall user experience in such contexts.
Accordingly, there is a long-felt need for systems and methods for efficiently processing eye tracking data and using edge processing and/or cloud processing as appropriate. Systems and methods are therefore needed that overcome these and other limitations of the prior art.
Various embodiments of the present invention relate to systems and methods for, inter alia, intelligently switching between cloud processing mode and edge processing mode based on a variety of criteria, such as the desired eye-tracker settings (e.g., minimum tracker rate, ideal tracker rate, and processing mode) and the available network capabilities (e.g., latency, bandwidth, and the like). In some embodiments, the criteria used for determining whether buffered cloud processing is viable include such parameters as upload bandwidth, tolerable processing delay, buffer drain rate, buffer fill rate, and maximum session duration. In accordance with one embodiment, cloud processing may be used instead of available edge processing in cases where there are tangible benefits to doing so, e.g., the cloud processing system provides added functionality, such as improved data analytics or the like.
The present invention will hereinafter be described in conjunction with the appended drawing figures, wherein like numerals denote like elements, and:
The present subject matter generally relates to improved systems and methods for processing image data produced via eye tracking systems through a hybrid edge (local) and cloud (remote) process, in which the system determines the most desirable processing mode based on various network criteria and performance metrics (e.g., latency and/or bandwidth). In that regard, the following detailed description is merely exemplary in nature and is not intended to limit the inventions or the application and uses of the inventions described herein. Furthermore, there is no intention to be bound by any theory presented in the preceding background or the following detailed description. In the interest of brevity, conventional techniques and components related to eye-tracking algorithms, image sensors, machine learning systems, cloud computing resources, and digital image processing may not be described in detail herein.
Referring first to
The eye-tracking assembly 120 is configured to observe the facial region 181 (
In the illustrated embodiment, eye-tracking assembly 120 includes one or more infrared (IR) light emitting diodes (LEDs) 121 positioned to illuminate facial region 181 of user 180. Assembly 120 further includes one or more cameras 125 configured to acquire, at a suitable frame-rate, digital images (“eye-tracking image data,” “eye images,” or simply “images”) corresponding to region 181 of the user's face. This image data may be stored in any convenient lossy or lossless image file format, such as JPG, GIF, PNG, TIFF, RAW, or any other format known in the art. In addition—particularly in the context of cloud tracking—various video compression techniques may be used. Suitable video coding formats include, for example, H.264, H.265, VP9, VP10, and/or machine learning based image compression tailored to eye-finding applications. Furthermore, the image data may be further compressed and/or partitioned into packets—with associated metadata—for transmittal over a network (e.g., network 150).
In some embodiments, the image data is analyzed locally—i.e., by a processing system located within computing device (or simply “device”) 110 using a corresponding software client (referred to as “edge processing”). In some embodiments, however, processing of image data frames is accomplished using an image processing module (or “processing system”) 162 that is remote from computing device 110—e.g., hosted within a cloud computing system 160 communicatively coupled to computing device 110 over a network 150 (referred to as “cloud processing”).
During cloud processing, processing system 162 performs the computationally complex operations necessary to determine the gaze point from frames of image data and is then transmitted back (as eye and gaze data) over the network to computing device 110. An example cloud-based eye-tracking system that may employed in the context of the present invention may be found, for example, in U.S. patent application Ser. No. 16/434,830, entitled “Devices and Methods for Reducing Computational and Transmission Latencies in Cloud Based Eye Tracking Systems,” filed Jun. 7, 2019, the contents of which are hereby incorporated by reference.
Referring now to
“Real-time processing,” as used herein, refers to a mode in which the system is processing eye images with a low enough latency between image capture and result that the user is provided with an interactive experience controlled by his or her eye gaze. For example, real-time processing is typically used with text communicators in which a non-verbal and/or mobility impaired individual gazes at cells containing words or letters to activate the cells and assemble messages that are then spoken aloud by the device's speakers.
“Buffered processing,” in contrast, refers to a mode in which the system is capturing images at a relatively high rate (e.g., 120 Hz to 500+Hz) with most of the system's resources prioritized for capture speed. Once capture is complete—or possibly while capturing is taking place—the eye tracking images are processed with the intent of returning one or more results that are based on processing batches of the images. This processing mode is often used, for example, in diagnostic-style user experiences (e.g., an amblyopia screening tool or the like) in which real-time gaze feedback is not relevant to the user and/or not visualized in the user interface.
Next, at step 303, the system determines whether the edge hardware (i.e., device 110 and any associated hardware) can deliver the performance associated with the desired eye-tracker settings (e.g., the minimum tracker rate, ideal tracker rate, and processing mode). A variety of criteria may be used to make this determination, such as latency and bandwidth, as described in further detail below. In general, initial step 303 involves determining whether the edge device, unassisted by the cloud, can process eye images at speeds up to the static limit for real-time processing (e.g., 60 Hz) and up to a separate limit for buffered processing (e.g., 120 Hz). These limits are compared to the desired eye-tracker settings previously determined.
If it is determined that the edge hardware cannot provide the requested performance at step 303, the system determines, at step 306, whether a cloud processing system is available with the desired performance. If not, then the system “gracefully fails” 308—e.g., provides the user with a message to “try again shortly” or the like. If so, however, then the process continues to step 309, and the system uses cloud processing to process the image data.
Even if it is determined at step 303 that the hardware of device 110 is capable of providing the requested performance (“Y” branch at step 303), the system (at step 304) will determine whether there are benefits to using cloud processing. That is, in some contexts, the cloud processing system 162 may provide additional value through improved analytics, machine learning, or the like. If not, the system continues to step 310, and the edge hardware is used to process the image data.
Otherwise, if there are benefits to using cloud processing, the method continues to step 305 and the system determines whether the cloud processing system can provide the performance required by desired eye-tracking settings. In this regard, the application may set configuration flags at launch to convey whether it prefers the benefits of cloud processing when available. If so, then the system commences to process the image data using cloud processing (step 309). In general, cloud processing limited by how fast and how much data can be sent per second, as described in further detail below.
When the system is in cloud processing mode (step 309), the performance will continue to be monitored (e.g., by assessing latency, bandwidth, etc.) to determine whether the performance has dropped below some predetermined acceptable level (step 307). If so, then the system may loop back to decision block 303 and determine whether the edge hardware platform should be used for processing.
With respect to performance metrics, both the latency of the communication path (i.e., the round-trip time) and the bandwidth of the network (average sustainable data transfer rate) will be considered. If, for example, the latency (round-trip time) is less than some predetermined value (e.g., about 100 ms), then the responsiveness is sufficient for most applications to be used in real-time processing mode.
In one embodiment, the “required (upload) bandwidth” is defined as the minimum tracker rate (discussed above) multiplied by the size of the eye image packet (e.g., in bytes). If this value of the required bandwidth is less than the current (upload) bandwidth, then real-time cloud processing is viable.
Buffered cloud processing, on the other hand, is limited primarily by upload bandwidth, not latency. It is also limited by the available on-device storage allocated to the unprocessed eye image packets and how quickly the system can remove those packets to make room for more. Accordingly, the system defines a “maximum tolerable processing delay” based on the upload bandwidth, and then the required bandwidth can be computed as the minimum tracker size multiplied by the eye image packet size, all divided by the maximum tolerable processing delay.
The system also determines how quickly it can drain the buffer to arrive at a “maximum session duration.” That is, the system defines a “buffer drain rate” as the upload bandwidth divided by the eye image packet size, then defines a “buffer fill rate” as the minimum tracker rate multiplied by the eye image packet size. The maximum session duration is then equal to the allocated storage divided by the difference between the buffer fill rate and the buffer drain rate. If the maximum session duration exceeds the duration set by the application, and the upload bandwidth exceeds the required upload bandwidth, then cloud processing in this mode is viable.
The above present just a few examples of the criteria that can be used by the system to select between cloud processing and edge processing paradigms, and are not intended to be limiting. In some embodiments, for example, the system provides a hybrid cloud-processing mode in which the two archetypical modes (real-time and buffered) are employed substantially simultaneously. The processing systems, modules, and other components described above may employ one or more machine learning or predictive analytics models to assist in carrying out their respective functions (i.e., with respect to bandwidth and latency criteria). In this regard, the phrase “machine learning” model is used without loss of generality to refer to any result of an analysis that is designed to make some form of prediction, such as predicting the state of a response variable, clustering, determining association rules, and performing anomaly detection. Thus, for example, the term “machine learning” refers to models that undergo supervised, unsupervised, semi-supervised, and/or reinforcement learning. Such models may perform classification (e.g., binary or multiclass classification), regression, clustering, dimensionality reduction, and/or such tasks. Examples of such models include, without limitation, artificial neural networks (ANN) (such as a recurrent neural networks (RNN) and convolutional neural network (CNN)), decision tree models (such as classification and regression trees (CART)), ensemble learning models (such as boosting, bootstrapped aggregation, gradient boosting machines, and random forests), Bayesian network models (e.g., naive Bayes), principal component analysis (PCA), support vector machines (SVM), clustering models (such as K-nearest-neighbor, K-means, expectation maximization, hierarchical clustering, etc.), linear discriminant analysis models.
In summary, an eye-movement data acquisition system (and/or method) includes an illumination source configured to produce infrared light; a camera assembly configured to receive a portion of the infrared light reflected from a user's face during activation of the infrared illumination source and produce a plurality of image data frames; an edge processing system communicatively coupled to the camera assembly and the illumination sources, the edge processing system configured to produce eye-movement data based on the plurality of image data frames; a cloud processing system communicatively coupled to the camera assembly and the illumination sources, the cloud processing system configured to produce the eye-movement data based on the plurality of the image data frames; and a decision module configured to select between the edge processing system and the cloud processing system based on a set of desired tracker settings and a set of capabilities of the cloud processing system.
A method for acquiring eye-movement data includes producing infrared light with an illumination source; receiving, with a camera assembly, a portion of the infrared light reflected from a user's face during activation of the infrared illumination source and produce a plurality of image data frames; providing an edge processing system communicatively coupled to the camera assembly and the illumination sources, the edge processing system configured to produce eye-movement data based on the plurality of image data frames; providing a cloud processing system communicatively coupled to the camera assembly and the illumination sources, the cloud processing system configured to produce eye-movement data based on the plurality of image data frames; selecting between the edge processing system and the cloud processing system based on a set of desired tracker settings and a set of capabilities of the cloud processing system; and transmitting the image data frames to the selected processing system and producing the eye-movement data therewith.
A decision module for use in connection with an eye-movement data acquisition system of the type comprising an illumination source configured to produce infrared light, a camera assembly configured to receive a portion of the infrared light reflected from a user's face during activation of the infrared illumination source and produce a plurality of image data frames, includes: a processor configured to communicate with an edge processing system and a cloud processing system, wherein the edge processing system and the cloud processing system are each configured to produce eye-movement data based on the plurality of image data frames; the processor further configured to select between the edge processing system and the cloud processing system for processing of the image data frames based on a set of desired tracker settings and a set of capabilities of the cloud processing system.
As used herein, the terms “module” or “controller” refer to any hardware, software, firmware, electronic control component, processing logic, and/or processor device, individually or in any combination, including without limitation: application specific integrated circuits (ASICs), field-programmable gate-arrays (FPGAs), dedicated neural network devices (e.g., Google Tensor Processing Units), virtual machines, electronic circuits, processors (shared, dedicated, or group) configured to execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any implementation described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other implementations, nor is it intended to be construed as a model that must be literally duplicated.
While the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing various embodiments of the invention, it should be appreciated that the particular embodiments described above are only examples, and are not intended to limit the scope, applicability, or configuration of the invention in any way. To the contrary, various changes may be made in the function and arrangement of elements described without departing from the scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
5367315 | Pan | Nov 1994 | A |
7572008 | Elvesjo et al. | Aug 2009 | B2 |
7884977 | Mori | Feb 2011 | B2 |
8086044 | Feng | Dec 2011 | B2 |
8350889 | Shammoh | Jan 2013 | B1 |
8878773 | Bozarth | Nov 2014 | B1 |
9185352 | Jacques | Nov 2015 | B1 |
9241044 | Shribman et al. | Jan 2016 | B2 |
9274597 | Karakotsios | Mar 2016 | B1 |
9498125 | Caraffi et al. | Nov 2016 | B2 |
9557568 | Ouderkirk | Jan 2017 | B1 |
9612656 | Sztuk et al. | Apr 2017 | B2 |
9710973 | Bar-Zeev et al. | Jul 2017 | B2 |
9733703 | Sullivan | Aug 2017 | B2 |
9806219 | Benson et al. | Oct 2017 | B2 |
9898866 | Fuchs et al. | Feb 2018 | B2 |
9996150 | Swaminathan et al. | Jun 2018 | B2 |
10016130 | Ganesan et al. | Jul 2018 | B2 |
10061383 | Ludusan | Aug 2018 | B1 |
10108260 | Park et al. | Oct 2018 | B2 |
10157313 | Zhang et al. | Dec 2018 | B1 |
10217286 | Angel | Feb 2019 | B1 |
10338644 | Jung et al. | Jul 2019 | B2 |
10402649 | Rabinovich et al. | Sep 2019 | B2 |
10445881 | Spizhevoy et al. | Oct 2019 | B2 |
10459481 | Pantel | Oct 2019 | B2 |
10466484 | Yoon | Nov 2019 | B1 |
10466779 | Liu | Nov 2019 | B1 |
10502963 | Noble | Dec 2019 | B1 |
10521661 | Chen et al. | Dec 2019 | B2 |
10534982 | Linden | Jan 2020 | B2 |
10558332 | Lo | Feb 2020 | B2 |
10594974 | Ivarsson et al. | Mar 2020 | B2 |
10599280 | Caras et al. | Mar 2020 | B2 |
10636158 | Kamiyama et al. | Apr 2020 | B1 |
10719951 | Kaehler et al. | Jul 2020 | B2 |
10755128 | Turkelson et al. | Aug 2020 | B2 |
10957069 | Skogö | Mar 2021 | B2 |
10963718 | Shiota et al. | Mar 2021 | B2 |
10976814 | Aleem et al. | Apr 2021 | B2 |
10999226 | Cohen | May 2021 | B2 |
11025918 | Chen et al. | Jun 2021 | B2 |
11132543 | Ranjan et al. | Sep 2021 | B2 |
11176367 | Fix | Nov 2021 | B1 |
20080143820 | Peterson | Jun 2008 | A1 |
20080212942 | Gordon | Sep 2008 | A1 |
20090196460 | Jakobs | Aug 2009 | A1 |
20110234750 | Lai | Sep 2011 | A1 |
20120249957 | Shibata | Oct 2012 | A1 |
20120250980 | Gillard | Oct 2012 | A1 |
20120254369 | Gillard | Oct 2012 | A1 |
20120257005 | Browne | Oct 2012 | A1 |
20130182066 | Ishimoto | Jul 2013 | A1 |
20130293488 | Na et al. | Nov 2013 | A1 |
20140037213 | Niederberger | Feb 2014 | A1 |
20140049452 | Maltz | Feb 2014 | A1 |
20150223684 | Hinton | Aug 2015 | A1 |
20150227735 | Chappell | Aug 2015 | A1 |
20160029883 | Cox | Feb 2016 | A1 |
20160085300 | Robbins | Mar 2016 | A1 |
20160106315 | Kempinski | Apr 2016 | A1 |
20160241892 | Cole | Aug 2016 | A1 |
20160262685 | Wagner et al. | Sep 2016 | A1 |
20160342205 | Shigeta | Nov 2016 | A1 |
20180046859 | Jarvenpaa | Feb 2018 | A1 |
20180275409 | Gao | Sep 2018 | A1 |
20180307048 | Alexander | Oct 2018 | A1 |
20190086674 | Sinay | Mar 2019 | A1 |
20190312973 | Engelke et al. | Oct 2019 | A1 |
20190324532 | Aleem et al. | Oct 2019 | A1 |
20190364492 | Azizi et al. | Nov 2019 | A1 |
20200104589 | Sengelaub et al. | Apr 2020 | A1 |
20200110271 | Komogortsev et al. | Apr 2020 | A1 |
20200134868 | Liu et al. | Apr 2020 | A1 |
20200183174 | Noui | Jun 2020 | A1 |
20200368616 | Delamont | Nov 2020 | A1 |
20200387217 | Chappell et al. | Dec 2020 | A1 |
20210011284 | Andreev | Jan 2021 | A1 |
20210041948 | Berkner-Cieslicki | Feb 2021 | A1 |
20220197376 | Boyle | Jun 2022 | A1 |
20220382064 | Rohn | Dec 2022 | A1 |
20220394234 | Etigson | Dec 2022 | A1 |
20220397956 | Lundell | Dec 2022 | A1 |
20220413302 | Meitav | Dec 2022 | A1 |
20220413603 | Held | Dec 2022 | A1 |
Entry |
---|
Gunawardena et al. Performance Analysis of CNN Models for Mobile Device Eye Tracking with Edge Computing, Procedia Computer Sciencevol. 207Issue C01 Jan. 2022 pp. 2291-2300https://doi.org/10.1016/j.procs.2022.09.288. |
Brandic et al. Addressing Application Latency Requirements through Edge Scheduling. J Grid Computing 17, 677-698 (2019). https://doi.org/10.1007/s10723-019-09493-z. |
International Preliminary Report on Patentability, PCT/US2020/045149, dated Feb. 17, 2022, 5pgs. |
International Search Report, PCT/US20/36524, dated Sep. 3, 2020, 2pgs. |
Written Opinion, PCT/US20/36524, dated Sep. 3, 2020, 20pgs. |
Xuan Li et al, “An Efficient Robust Eye Localization by Learning the Convolution Distribution Using Eye Template”, Computional Intelligence and Neuroscience, Oct. 2015, vol. 2015, 21pgs. |
Number | Date | Country | |
---|---|---|---|
20230274578 A1 | Aug 2023 | US |