Gaze prediction and tracking allows the use of a person's eyes to manipulate input on a device, such as a mobile computing device. Many devices utilize one or more applications to predict a person's eye gaze. While many devices may utilize multiple image sensors, such as one or more cameras integrated into each display or screen of such device, to achieve an increased accuracy when predicting and/or tracking the gaze of the user, there remains room for improvement to increase eye gaze prediction accuracy, and in some instances, head pose prediction accuracy. It is with respect to these and other general considerations that embodiments have been described. Also, although relatively specific problems have been discussed, it should be understood that the embodiments should not be limited to solving the specific problems identified in the background.
Aspects of the present disclosure are directed to predicting eye gaze information of a user and/or predicting head pose information of a user and then using the predicted eye gaze and/or predicted head pose to control one or more functions associated with a computing device or system.
In accordance with at least one aspect of the present disclosure, a method for generating a predicted eye gaze of a user is disclosed. The method may include receiving a first image of a user from a first camera; receiving a second image of the user from a second camera; obtaining a hinge angle between the first camera and the second camera; extracting feature information for a first eye of the user based on the first image and the second image; extracting feature information for a second eye of the user based on the first image and the second image; extracting facial landmark features for the user based on at least one of the first image and the second image; and generating, using an eye gaze predictor, a predicted eye gaze for the user based on the extracted feature information for the first eye of the user, the extracted feature information for the second eye of the user, and the extracted facial landmark features, wherein a confidence level associated with the predicted eye gaze for the user is based on the obtained hinge angle.
In accordance with at least one aspect of the present disclosure, a system for generating at least one of a predicted eye gaze or a predicted head pose of a user, is described. The system may include a processor; a first image sensor; a second image sensor; and memory including instructions, which when executed by the processor, cause the processor to: receive a first image of a user from the first image sensor; receive a second image of the user from the second image sensor; obtain a hinge angle between a first display associated with the first image sensor and a second display associated with the second image sensor; extract feature information for a first eye of the user based on the first image and the second image; extract feature information for a second eye of the user based on the first image and the second image; extract facial landmark features for the user based on at least one of the first image and the second image; generate an estimated eye gaze for the user based on the extracted feature information for the first eye of the user, the extracted feature information for the second eye of the user, and the extracted facial landmark features; calculate a first angle of offset between an optical axis of the first image sensor and a visual axis of the user; calculate a second angle of offset between an optical axis of the second image sensor and the visual axis of the user; and generate at least one of a predicted eye gaze for the user or a predicted head pose for the user based on extracted feature information for the first eye of the user, extracted feature information for the second eye of the user, extracted facial landmark features, and at least one of the first and second angle of offsets.
In accordance with at least one aspect of the present disclosure, a computer storage medium is described. The computer storage medium may include instructions, which when executed by a processor, cause the processor to: receive a first image of a user from a first image sensor; receive a second image of the user from a second image sensor; obtain a hinge angle between a first display associated with the first image sensor and a second display associated with the second image sensor; extract feature information for a first eye of the user based on the first image and the second image; extract feature information for a second eye of the user based on the first image and the second image; extract facial landmark features for the user based on at least one of the first image and the second image; and generate at least one of a predicted eye gaze for the user or a predicted head pose for the user based on the extracted feature information for the first eye of the user, the extracted feature information for the second eye of the user, and the extracted facial landmark features, wherein the at least one of the predicted eye gaze for the user or the predicted head pose for the user is based on an angle of offset between an optical axis of an image sensor and a visual axis of the user.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Non-limiting and non-exhaustive examples are described with reference to the following Figures.
In the following detailed description, references are made to the accompanying drawings that form a part hereof, and in which are shown by way of illustrations specific embodiments or examples. These aspects may be combined, other aspects may be utilized, and structural changes may be made without departing from the present disclosure. Embodiments may be practiced as methods, systems, or devices. Accordingly, embodiments may take the form of a hardware implementation, an entirely software implementation, or an implementation combining software and hardware aspects. The following detailed description is therefore not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims and their equivalents.
Aspects of the present disclosure are directed to predicting eye gaze information of a user and/or predicting head pose information of a user and then using the predicted eye gaze and/or predicted head pose to control one or more functions associated with a computing device or system. As the productivity and usability of mobile computing devices, such as but not limited to smartphones, increases, one potential limitation of such devices is related to the display size. Thus, mobile computing devices may include two or more displays thereby allowing a user to view additional information utilizing a larger composite display comprised of two or more smaller displays. In examples, two or more displays may be connected or otherwise coupled to one another utilizing a hinge or other coupling means. As previously mentioned, each of the displays may include an image sensor, such as a camera, to acquire one or more images of the user and/or a video stream comprising one or more images of the user. Such images may be utilized to predict an eye gaze of the user and/or a head pose of the user. For example, while capturing simultaneous input streams from different image sensors, such streams may be provided to one or more trained neural networks, which may generate a more accurate, predicted eye gaze and or predicted head pose for the user, where the user is located within the field of view of the image sensors. In examples, the predicted eye gaze and or predicted head pose may be tracked and may be used to control one or more functions associated with the computing device and/or may be implemented in a synthetically generated image of a user.
In accordance with examples of the present disclosure, a user may initially proceed through an enrollment processes, utilizing an image sensor configuration, where the image sensor configuration may include two or more image sensors. In some examples, each image sensor may be implemented as part of a display or otherwise associated with a display. In some examples, a hinge angle between each image sensor, or each display associated with the image sensor, may be known or otherwise generated and provided as an input to the trained neural network model, where such hinge angle may be utilized to increase an accuracy of a predicted eye gaze and/or predicted head pose of a user. In examples, during the enrollment process, one or more calibration targets may appear on a display whereby the predicted eye gaze and/or predicted head pose may be based on the location of the calibration target and the hinge angle between the two display devices. Of course, such implementations are not limited to two image sensors, but may include more than two image sensors. Utilizing more than two image sensors may improve the accuracy of the predicted eye gaze and/or predicted head pose during the enrollment process, allowing a user's eye location to be determined more accurately utilizing triangulation. Further, having multiple views of a user's eyes based on respective varying hinge angles may provide more accurate three-dimensional reconstruction of a user's eye when used in synthetic generated applications and models such as an avatar. In some examples, the hinge angle between each of the image sensors (or each display associated with a respective image sensor), whether received from a hinge sensor or calculated, may be used to provide a confidence level of a predicted eye gaze of a user and/or a predicted head pose of a user within a certain confidence level.
As previously discussed, the predicted eye gaze and/or predicted head pose of a user 102 may be determined based on a previously performed enrollment process conducted by a user, such as user 102. In examples, the one or more targets 118 may be displayed on one or more displays of the dual screen mobile computing device 116 and a hinge angle obtained from a hinge sensor or otherwise calculated may be obtained at a same time. That is, in some examples, the hinge angle may be utilized by the neural network model 114 to generate an eye gaze prediction and/or head pose prediction having an increased accuracy. Alternatively, or in addition, the hinge angle may be utilized to identify one or more calibration parameters that are based on the hinge angle; such calibration parameters may be utilized by the neural network model 114 to generate an eye gaze prediction and/or head pose prediction having an increased accuracy. Alternatively, or in addition, the hinge angle may be utilized to provide a confidence level associated with a predicted eye gaze and/or predicted head pose of a user; that is, using the hinge angle, a predicted eye gaze and/or a predicted head pose may be associated with a confidence level based on one or more hinge angles obtained during an enrollment process.
Accordingly, during a calibration process, the offset between the predicted gaze locations and the actual gaze locations can be calculated. Having multiple image sensors providing image information during the calibration process allows the system to extract more samples of the user's appearance. Further, having the hinge angle known between the two image sensors, or cameras, allows for the creation of a geometric model of the eyes such that a view direction of the user can be estimated. This information, along with the data from multiple sensors such as but no limited to left and right eye-images, facial landmarks, head pose, and other signals can be provided to a neural network to obtain an estimate of a location where a person is looking at on one of the displays or screens. Once the gaze estimates are obtained, there can exist another process for either finetuning the existing neural network to better fit the user, or a separate network or function can be used to do the finetuning.
As an initial example, a first display configuration 201 may include a first display 202 having a first camera 203 and a second display 204 having a second camera 205. A hinge angle α1 between the first display 202 and the second display 204 may be obtained. For example, a hinge sensor or a hinge angle sensor may provide a hinge angle corresponding to an angle between each camera associated with a respective display (or each display associated with a respective camera).
Each camera 203 and 205 may include a respective field of view 206/207. Accordingly, an image of a subject within the field of views 206 and 207 may be obtained by the first camera 203 and the second camera 205. Such images, together with the hinge angle, may be utilized to generate a predicted eye gaze of the subject and or predicted head pose of the subject. For example, the angle difference between the optical axis and the visual axis, referred to as the Kappa angle can be calculated and utilized to fine tune a predicted eye gaze of the user and/or a predicted head pose of the user. Having the hinge angle known between the two image sensors, or cameras, allows for the creation of the geometric model of the eyes such that a view direction of the user can be estimated. This information, along with the data from multiple sensors such as but no limited to left and right eye-images, facial landmarks, head pose, and other signals can be provided to neural network to obtain an estimate of a location where a person is looking at on one of the displays or screens.
A second display configuration 208 may include the first display 202 and the second display 204 having a greater hinge angle than the first display configuration 201. That is, a hinge angle α2 may be greater than the hinge angle α1. Accordingly, a resulting field of view (e.g., FoVR) for the second display configuration 208 may be greater than the resulting field of view (e.g., FoVR) for the first display configuration 201. As another example, a third display configuration 209 may include the first display 202 and the second display 204 having a greater hinge angle than the first display configuration 201 and the second display configuration 208. That is, a hinge angle α3 may be greater than the hinge angle α1 and hinge angle α1. Accordingly, a resulting field of view (e.g., FoVR) for the third display configuration 209 may be greater than the resulting field of view (e.g., FoVR) for the first display configuration 201 and the resulting field of view (e.g., FoVR) for the second display configuration 208. Each display configuration 201, 208, and/or 209 may acquire images of a user from different angles.
In some examples, the sequence of display configurations 201, 208, and/or 209 may be encountered during an enrollment process where one or more targets are displayed on each of the displays 202 and/or 204 such that images of the user, the hinge angle between each display, and the location on the display corresponding to the displayed target may be obtained or recorded. For example, during the enrollment process, one or more display targets may be displayed on the display 202 and/or 204 in the first configuration 201. An image from each image sensor or camera 203 and 205 may be obtained of the user, together with the hinge angle α1 and the location of the displayed target. Further, one or more display targets may be displayed on the display 202 and/or 204 in the first configuration 208 such that an image from each image sensor or camera 203 and 205 may be obtained of the user, together with the hinge angle α2 and the location of the displayed target. Based on the image from each image sensor or camera 203 and 205, the neural network receives as input one or more set of landmarks, eye images and head pose one set for each of the image sensors. In some examples, when a user is utilizing the dual screen mobile computing device, a confidence level of a predicted eye gaze and/or head pose of the user may be generated based on similarity between the hinge angle and the hinge angle of one of the display configurations utilized during the enrollment process.
As another example, one or more calibration parameters may be utilized and/or obtained during an eye gaze prediction and/or head pose prediction process based on the display configuration closest to or most similar to the display configuration utilized during the enrollment process. For example, a display configuration, such as the display configuration 208 having a hinge angle α2 may utilize different calibration and or configuration parameters than a display configuration more similar to configuration 201 having a hinge angle α1. Thus, one or more calibration parameters closest to a display configuration, as determined by the hinge angle for example, may be used to obtain a predicted eye gaze and/or head pose having greater accuracy. Further, the offset between the predicted gaze locations and the actual gaze locations can be included as one of the calibration parameters. That is, having multiple image sensors providing image information during the calibration process allows the system to extract more samples of the user's appearance. Thus, the creation of a geometric model of the eyes allows the view direction of the user can be estimated. This information, along with the data from multiple sensors such as but no limited to left and right eye-images, facial landmarks, head pose, and other signals can be provided to a neural network to obtain an estimate of a location where a person is looking at on one of the displays or screens. Once the gaze estimates are obtained, there can exist another process for either finetuning the existing neural network to better fit the user, or a separate network or function can be used to do the finetuning.
The eye features 310A and 310B may be combined (e.g., concatenated) at 320 with other features, such as facial landmark features 318 obtained from a landmark feature extractor 314, where the combined features may be provided to an eye gaze predictor 322 to generate an eye gaze prediction 324. Similarly, the eye features 310A and 310B may be combined (e.g., concatenated) at 320 with other features, such as facial landmark features 318 obtained from the landmark feature extractor 314, where the combined features may be provided to a head pose predictor 326 to generate a head pose prediction 328. The landmark feature extractor 314 may extract facial landmark features 318 from one or more images of the user. For example, the landmark feature extractor 314 may receive a facial image 312A of a user from the same image utilized to acquire or obtain the image 302A of the first eye and the image 304A of the second eye. Alternatively, or in addition, the landmark feature extractor 314 may receive a facial image 312B of a user from the same image utilized to acquire or obtain the image 302B of the first eye and the image 304B of the second eye. The landmark feature extractor 314 may include a neural network that includes one or more flattening layers and one or more fully connected layers. In examples, the landmark feature extractor 314 may determine and/or detect the user's face and extract the facial landmark features 318, which may include but are not limited to the location of the eyes, pupils, nose, chin, ears, etc. of the user. In some examples, the hinge angle α 330 between first and second displays and/or first and second cameras may be provided to the gaze predictor 322, head pose predictor 326, and/or landmark feature extractor 314 for use in generating the eye gaze prediction 324 and/or head pose prediction 328. The eye gaze predictor 322 and/or the head pose predictor 326 may include, but is not limited to, a transformer model, convolution neural network model, and/or a support vector machine model. In examples, the offset between the predicted gaze locations and the actual gaze locations can be calculated. This information, along with the data from multiple sensors such as but no limited to left and right eye-images, facial landmarks, head pose, and other signals can be provided to a neural network to obtain an estimate of a location where a person is looking at on one of the displays or screens. Once the gaze estimates are obtained, there can exist another process for either finetuning the existing neural network to better fit the user, or a separate network or function can be used to do the finetuning.
The system memory 404 may include an operating system 405 and one or more program modules 406 suitable for running software application 420, such as one or more components supported by the systems described herein. As examples, system memory 404 may include an enrollment engine 421, the gaze predictor 422, the head pose predictor 423, an eye feature extractor 424, and the landmark feature extractor 425. The enrollment engine may perform one or more processes for capturing and associating a hinge angle with a displayed target location, predicted eye gaze, and/or predicted head gaze of a user as previously described and as further described herein. The gaze predictor 422 may be the same as or similar to the gaze predictor 322 previously described. The head pose predictor 423 may be the same as or similar to the head pose predictor 326 as previously described. The eye feature extractor 424 may be the same as or similar to the neural network processing pipeline 308 as previously described. The landmark feature extractor 425 may be the same as or similar to the landmark feature extractor 314 as previously described. The operating system 405, for example, may be suitable for controlling the operation of the computing device 400.
Furthermore, embodiments of the disclosure may be practiced in conjunction with a graphics library, other operating systems, or any other application program and is not limited to any particular application or system. This basic configuration is illustrated in
As stated above, a number of program modules and data files may be stored in the system memory 404. While executing on the processing unit 402, the program modules 406 (e.g., applications 420) may perform processes including, but not limited to, the aspects, as described herein. Other program modules that may be used in accordance with aspects of the present disclosure may include electronic mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided programs, etc.
Furthermore, embodiments of the disclosure may be practiced in an electrical circuit discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. For example, embodiments of the disclosure may be practiced via a system-on-a-chip (SOC) where each or many of the components illustrated in
The computing device 400 may also have one or more input device(s) 412 such as a keyboard, a mouse, a pen, a sound or voice input device, a touch or swipe input device, etc. The one or more input device 412 may include a plurality of image sensors, such as the image sensor 103A and/or 103B. Further, the one or more input devices 412 may include a hinge angle sensor that provides a hinge angle between one or more display devices. The output device(s) 414 such as a plurality of displays, speakers, a printer, etc. may also be included. The aforementioned devices are examples and others may be used. The computing device 400 may include one or more communication connections 416 allowing communications with other computing devices 450. Examples of suitable communication connections 416 include, but are not limited to, radio frequency (RF) transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, and/or serial ports.
The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. The system memory 404, the removable storage device 409, and the non-removable storage device 410 are all computer storage media examples (e.g., memory storage). Computer storage media may include RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information, and which can be accessed by the computing device 400. Any such computer storage media may be part of the computing device 400. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” may describe a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
In yet another alternative embodiment, the mobile computing device 500 is a portable phone system, such as a cellular phone. The mobile computing device 500 may also include an optional keypad. Optional keypad may be a physical keypad or a “soft” keypad generated on the touch screen display 505A/505B.
In various embodiments, the output elements include the displays 505A and 505B for showing a graphical user interface (GUI), a visual indicator (e.g., a light emitting diode), and/or an audio transducer (e.g., a speaker). In some aspects, the mobile computing device 500 incorporates a vibration transducer for providing the user with tactile feedback. In yet another aspect, the mobile computing device 500 incorporates input and/or output ports, such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
One or more application programs 566 may be loaded into the memory 562 and run on or in association with the operating system 564. Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, maps programs, and so forth. The system 502 also includes a non-volatile storage area 568 within the memory 562. The non-volatile storage area 568 may be used to store persistent information that should not be lost if the system 502 is powered down. The application programs 566 may use and store information in the non-volatile storage area 568, such as e-mail or other messages used by an e-mail application, and the like. A synchronization application (not shown) also resides on the system 502 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage area 568 synchronized with corresponding information stored at the host computer. As should be appreciated, other applications may be loaded into the memory 562 and run on the mobile computing device 500 described herein (e.g., search engine, extractor module, relevancy ranking module, answer scoring module, etc.).
The system 502 has a power supply 570, which may be implemented as one or more batteries. The power supply 570 might further include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
The system 502 may also include a radio interface layer 572 that performs the function of transmitting and receiving radio frequency communications. The radio interface layer 572 facilitates wireless connectivity between the system 502 and the “outside world,” via a communications carrier or service provider. Transmissions to and from the radio interface layer 572 are conducted under control of the operating system 564. In other words, communications received by the radio interface layer 572 may be disseminated to the application programs 566 via the operating system 564, and vice versa.
The visual indicator 520 may be used to provide visual notifications, and/or an audio interface 574 may be used for producing audible notifications via the audio transducer 525. In the illustrated embodiment, the visual indicator 520 is a light emitting diode (LED) and the audio transducer 525 is a speaker. These devices may be directly coupled to the power supply 570 so that when activated, they remain on for a duration dictated by the notification mechanism even though the processor 560 and/or special-purpose processor 561 and other components might shut down for conserving battery power. The LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device. The audio interface 574 is used to provide audible signals to and receive audible signals from the user. For example, in addition to being coupled to the audio transducer 525, the audio interface 574 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation. In accordance with embodiments of the present disclosure, the microphone may also serve as an audio sensor to facilitate control of notifications, as will be described below. The system 502 may further include a video interface 576 that enables an operation of an on-board cameras or images sensors 504A and 504B to acquire still images, video streams, and the like. The n-board cameras or images sensors 504A and 504B may be the same as or similar to the previously described image sensors 103A and/or 103B. in some examples, the system 502 may include a hinge sensor 532 for obtaining the hinge angle between a first display, such as display 505A and/or the second display, such as display 505B. In some examples, the hinge sensor 532 may obtain a hinge angle between the first image sensor 504A and the second image sensor 504B.
In some examples, the special-purpose processor 561 may correspond to a neural processing engine (e.g., NPE) or neural processing unit (NPU).
A mobile computing device 500 implementing the system 502 may have additional features or functionality. For example, the mobile computing device 500 may also include additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
Data/information generated or captured by the mobile computing device 500 and stored via the system 502 may be stored locally on the mobile computing device 500, as described above, or the data may be stored on any number of storage media that may be accessed by the device via the radio interface layer 572 or via a wired connection between the mobile computing device 500 and a separate computing device associated with the mobile computing device 500, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated such data/information may be accessed via the mobile computing device 500 via the radio interface layer 572 or via a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
A general order for the steps of the method 600 is shown in
The method starts at 602, where flow may proceed to 604. At 604, a target may be displayed on a first screen of the dual screen computing device. The method 600 may proceed to 606, where a hinge angle is obtained. In examples, the hinge angle may be obtained at 606 from a hinge sensor, for example the hinge sensor 532, or otherwise may be calculated, utilizing images received from each of the image sensors. The method 600 may proceed to 608, where an image may be received from a first camera or first image sensor and an image may be received from a second camera or a second image sensor.
In examples, a method 600 may proceed to 610 where a plurality of features may be extracted or otherwise obtained from the received images. For example, for a first eye, images of the first eye from the first and second cameras may be provided to the neural network processing pipeline, such as the neural network processing pipeline 308, where the neural network processing pipeline 308 may extract eye features for the first eye. For a second eye, images of the second eye from the first and second cameras may be provided to the neural network processing pipeline, such as the neural network processing pipeline 308, where the neural network processing pipeline 308 may extract eye features for the second eye. In examples, facial images of the user from the first and second cameras may be provided to the landmark feature extractor, such as the landmark feature extractor 314, where the landmark feature extractor 314 may extract landmark features for the user's face.
The method may proceed to 612 where eye gaze prediction information and/or head pose prediction information may be generated, where such information may be specific to the hinge angle and the target displayed on the screen. In examples, one or more calibration parameters may be obtained or otherwise generated for the hinge angle and/or target displayed on the screen. Such information may be stored together and later accessed based on an acquired hinge angle. As one example, an angle of offset, Kappa, may be estimated for each user and stored, wherein the Kappa angle may be associated with a hinge angle. In some examples, the Kappa angle may be provided or accessed via a user identifier. In some examples, for a single hinge angle, a plurality of display targets may be sequentially displayed on the screen. Thus, the method 600 may proceed through 606, 608, 610, and 612 multiple times for a single hinge angle, as indicated by 614. In addition, as a user is instructed to change the hinge angle, the method 600 may proceed through 606, 608, 610, and 612 multiple times as indicated by 614. The method 600 may end at 616.
The method starts at 702, where flow may proceed to 704. At 704, a hinge angle may be obtained from each display of a dual screen computing device. In examples, the hinge angle may be obtained from a hinge sensor, for example the hinge sensor 532, or otherwise may be calculated, utilizing images received from each of the image sensors. The method 700 may proceed to 706, where one or more calibration parameters may be retrieved based on the angel of offset. In one or more examples, an angle of offset, such as the Kappa angle, may be obtained as one or more calibration parameters. In some examples, the one or more calibration parameters may correspond to a confidence level and/or may be utilized for different display configurations. For example, a display configuration where the hinge angle is less than thirty degrees for example, may utilize different calibration parameters than when the hinge angle is greater than sixty degrees. The method 700 may proceed to 708, where an image may be received from a first camera or first image sensor and an image may be received from a second camera or a second image sensor.
In examples, the method 700 may proceed to 710 where a plurality of features may be extracted or otherwise obtained from the received images. For example, for a first eye, images of the first eye from the first and second cameras may be provided to the neural network processing pipeline, such as the neural network processing pipeline 308, where the neural network processing pipeline 308 may extract eye features for the first eye. For a second eye, images of the second eye from the first and second cameras may be provided to the neural network processing pipeline, such as the neural network processing pipeline 308, where the neural network processing pipeline 308 may extract eye features for the second eye. In examples, facial images of the user from the first and second cameras may be provided to the landmark feature extractor, such as the landmark feature extractor 314, where the landmark feature extractor 314 may extract landmark features for the user's face.
The method may proceed to 712 where eye gaze prediction information and/or head pose prediction information may be generated, where such information may be specific to the hinge angle obtained at 704 and/or the Kappa angle obtained at 704. In examples, the hinge angle and/or Kappa angle may be utilized to generate the eye gaze prediction information and/or the head pose prediction information. Alternatively, or in addition, the hinge angle may be utilized to provide a confidence level, where the confidence level may be based on a hinge angle utilized in the enrolment process and the hinge angle obtained at 704. The method 700 may end at 714.
The method starts at 802, where flow may proceed to 804. At 804, an angle may be obtained directly from a hinge sensor as previously discussed. Alternatively, or in addition, an image from a first camera may be obtained at 808 and an image from a second camera may be obtained at 810. In examples, overlapping regions or overlapping areas within such images may be identified at 812, such that an angle for each camera with respect to the other camera may be determined or otherwise generated at 814 based on such overlapping image regions. That is, where a hinge sensor may not be present to provide a direct measurement of a hinge angle between a first display and a second display, the method 800 may generate such angle for use in the calibration and/or enrollment process, and/or for use when generating the eye gaze prediction information and/or the head pose prediction information. The method 800 may end at 816.
One or more of the previously described program modules 406 or software applications 420 may be employed by server device 1002 and/or the personal computer 1004, tablet computing device 1006, or mobile computing device 1008, as described above. For example, the server device 1002 may include the enrollment engine 421, the gaze predictor 422, the head pose predictor 423, the eye feature extractor 424, and the landmark feature extractor 425.
The server device 1002 may provide data to and from a client computing device such as a personal computer 1004, a tablet computing device 1006 and/or a mobile computing device 1008 (e.g., a smart phone) through a network 1015. By way of example, the computer system described above may be embodied in a personal computer 1004, a tablet computing device 1006 and/or a mobile computing device 1008 (e.g., a smart phone). Any of these embodiments of the computing devices may obtain content from the store, in addition to receiving graphical data useable to be either pre-processed at a graphic-originating system, or post-processed at a receiving computing system.
In addition, the aspects and functionalities described herein may operate over distributed systems (e.g., cloud-based computing systems), where application functionality, memory, data storage and retrieval and various processing functions may be operated remotely from each other over a distributed computing network, such as the Internet or an intranet. User interfaces and information of various types may be displayed via on-board computing device displays or via remote display units associated with one or more computing devices. For example, user interfaces and information of various types may be displayed and interacted with on a wall surface onto which user interfaces and information of various types are projected. Interaction with the multitude of computing systems with which embodiments of the invention may be practiced include, keystroke entry, touch screen entry, voice or other audio entry, gesture entry where an associated computing device is equipped with detection (e.g., camera) functionality for capturing and interpreting user gestures for controlling the functionality of the computing device, and the like.
Aspects of the present disclosure, for example, are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to aspects of the disclosure. The functions/acts noted in the blocks may occur out of the order as shown in any flowchart. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The description and illustration of one or more aspects provided in this application are not intended to limit or restrict the scope of the disclosure as claimed in any way. The aspects, examples, and details provided in this application are considered sufficient to convey possession and enable others to make and use the best mode of claimed disclosure. The claimed disclosure should not be construed as being limited to any aspect, example, or detail provided in this application.
The present disclosure relates to systems and methods for generating a predicted eye gaze of a user according to at least the examples provided in the sections below:
In yet another aspect, some examples include a system including one or more processors and memory coupled to the one or more processors, the memory storing one or more instructions which when executed by the one or more processors, causes the one or more processors perform any of the methods described herein (e.g., A1-A10 described above).
In yet another aspect, some examples include a non-transitory computer-readable storage medium storing one or more programs for execution by one or more processors of a storage device, the one or more programs including instructions for performing any of the methods described herein (e.g., A1-A10 described above).
In yet another aspect, some examples include a system including one or more processors and memory coupled to the one or more processors, the memory storing one or more instructions which when executed by the one or more processors, causes the one or more processors perform any of the methods described herein (e.g., B1-B5 described above).
In yet another aspect, some examples include a non-transitory computer-readable storage medium storing one or more programs for execution by one or more processors of a storage device, the one or more programs including instructions for performing any of the methods described herein (e.g., B1-B5 described above).
Regardless of whether shown and described in combination or separately, the various features (both structural and methodological) are intended to be selectively included or omitted to produce an embodiment with a particular set of features. Having been provided with the description and illustration of the present application, one skilled in the art may envision variations, modifications, and alternate aspects falling within the spirit of the broader aspects of the general inventive concept embodied in this application that do not depart from the broader scope of the claimed disclosure.