The present disclosure generally relates to improved computer-based systems improved computing devices configured for tracking eye-related parameters during user interaction with electronic computing devices.
In some embodiments, the present disclosure provides an exemplary technically improved computer-based method that includes continuously obtaining, by at least one processor, a visual input comprising a plurality of representations of at least one eye of at least one user to continuously track the plurality of representations over a predetermined time duration; wherein the visual input comprises a series of video frames, a series of images, or both; continuously applying, by the at least one processor, at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously inputting, by the at least one processor, the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the classified activity.
Some embodiments of the present disclosure relate to a system including: a camera component, wherein the camera component is configured to acquire a visual input, wherein the visual input includes a real-time representation of at least one eye of at least one user and wherein the visual input comprises at least one video frame, at least one image, or both; at least one processor; a non-transitory computer memory, storing a computer program that, when executed by the at least one processor, causes the at least one processor to: continuously apply at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously input the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to determine an attentiveness level of the at least one user over the predetermined time duration to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the at least one classified activity.
In some embodiments, the visual input includes a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.
In some embodiments, the at least one processor continuously applies to the visual input, at least one facial feature algorithm, wherein the at least one facial feature algorithm is chosen from at least one of: at least one face detection algorithm, at least one face tracking algorithm, at least one head pose estimation algorithm, at least one emotion recognition algorithm, or combinations thereof.
In some embodiments, application of the at least one facial feature algorithm transforms the representation of the at least one additional facial feature of the at least one user into at least one additional facial feature vector associated with the at least one additional facial feature, wherein the at least one facial feature vector is chosen from: at least one face angle vector, at least one facial coordinate vector, or a combination thereof.
In some embodiments, the processor continually obtains a time series of additional facial feature vectors.
In some embodiments, the plurality of representations includes at least one eye movement of at least one user.
In some embodiments, the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.
In some embodiments, the predetermined time duration ranges from 1 to 300 minutes.
In some embodiments, the at least one eye gaze vector includes at least two reference points, the at least two reference points including: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.
In some embodiments, the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors includes at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.
In some embodiments the at least one processor averages the at least one first eye gaze vector and the at least one second eye gaze vector.
In some embodiments, the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof.
Various embodiments of the present disclosure can be further explained with reference to the attached drawings, wherein like structures are referred to by like numerals throughout the several views. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the present disclosure. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ one or more illustrative embodiments.
Embodiments of the present disclosure, briefly summarized above and discussed in greater detail below, can be understood by reference to the illustrative embodiments of the disclosure depicted in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this disclosure and are therefore not to be considered limiting of its scope, for the disclosure may admit to other equally effective embodiments.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. The figures are not drawn to scale and may be simplified for clarity. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.
Among those benefits and improvements that have been disclosed, other objects and advantages of this disclosure can become apparent from the following description taken in conjunction with the accompanying figures. Detailed embodiments of the present disclosure are disclosed herein; however, it is to be understood that the disclosed embodiments are merely illustrative of the disclosure that may be embodied in various forms. In addition, each of the examples given in connection with the various embodiments of the present disclosure is intended to be illustrative, and not restrictive.
Throughout the specification, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise. The phrases “in one embodiment” and “in some embodiments” as used herein do not necessarily refer to the same embodiment(s), though it may. Furthermore, the phrases “in another embodiment” and “in some other embodiments” as used herein do not necessarily refer to a different embodiment, although it may. Thus, as described below, various embodiments of the disclosure may be readily combined, without departing from the scope or spirit of the disclosure. Further, when a particular feature, structure, or characteristic is described in connection with an implementation, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other implementations whether or not explicitly described herein.
The term “based on” is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise. In addition, throughout the specification, the meaning of “a,” “an,” and “the” include plural references. The meaning of “in” includes “in” and “on.”
It is understood that at least one aspect/functionality of various embodiments described herein can be performed in real-time and/or dynamically. As used herein, the term “real-time” is directed to an event/action that can occur instantaneously or almost instantaneously in time when another event/action has occurred. For example, the “real-time processing,” “real-time computation,” and “real-time execution” all pertain to the performance of a computation during the actual time that the related physical process (e.g., a user interacting with an application on a mobile device) occurs, in order that results of the computation can be used in guiding the physical process.
As used herein, the term “dynamically” means that events and/or actions can be triggered and/or occur without any human intervention. In some embodiments, events and/or actions in accordance with the present disclosure can be in real-time and/or based on a predetermined periodicity of at least one of: nanosecond, several nanoseconds, millisecond, several milliseconds, second, several seconds, minute, several minutes, hourly, several hours, daily, several days, weekly, monthly, etc.
As used herein, the term “runtime” corresponds to any behavior that is dynamically determined during an execution of a software application or at least a portion of software application.
In some embodiments, the disclosed specially programmed computing systems with associated devices are configured to operate in the distributed network environment, communicating over a suitable data communication network (e.g., the Internet, etc.) and utilizing at least one suitable data communication protocol (e.g., IPX/SPX, X.25, AX.25, AppleTalk™, TCP/IP (e.g., HTTP), etc.). Of note, the embodiments described herein may, of course, be implemented using any appropriate hardware and/or computing software languages. In this regard, those of ordinary skill in the art are well versed in the type of computer hardware that may be used, the type of computer programming techniques that may be used (e.g., object oriented programming), and the type of computer programming languages that may be used (e.g., C++, Objective-C, Swift, Java, Javascript). The aforementioned examples are, of course, illustrative and not restrictive.
As used herein, the terms “image(s)” and “image data” are used interchangeably to identify data representative of visual content which includes, but not limited to, images encoded in various computer formats (e.g., “.jpg”, “.bmp,” etc.), streaming video based on various protocols (e.g., Real-time Streaming Protocol (RTSP), Real-time Transport Protocol (RTP), Real-time Transport Control Protocol (RTCP), etc.), recorded/generated non-streaming video of various formats (e.g., “.mov,” “.mpg,” “.wmv,” “.avi,” “.flv,” ect.), and real-time visual imagery acquired through a camera application on a mobile device.
The material disclosed herein may be implemented in software or firmware or a combination of them or as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), and others.
In another form, a non-transitory article, such as a non-transitory computer readable medium, may be used with any of the examples mentioned above or other examples except that it does not include a transitory signal per se. It does include those elements other than a signal per se that may hold data temporarily in a “transitory” fashion such as RAM and so forth.
As used herein, the terms “computer engine” and “engine” identify at least one software component and/or a combination of at least one software component and at least one hardware component which are designed/programmed/configured to manage/control other software and/or hardware components (such as the libraries, software development kits (SDKs), objects, etc.).
Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. In some embodiments, the one or more processors may be implemented as a Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors; x86 instruction set compatible processors, multi-core, or any other microprocessor or central processing unit (CPU). In various implementations, the one or more processors may be dual-core processor(s), dual-core mobile processor(s), and so forth.
Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that make the logic or processor.
As used herein, the term “user” shall have a meaning of at least one user.
As used herein, the terms “face” and “head” are used interchangeably and both refer to any portion of a user's body situated above the user's shoulders. The terms “face” and “head” are meant to encompass any accessories worn by the user in the portion of the user's body above the shoulders including but not limited to, a hat, glasses, jewelry and the like.
The present disclosure, among other things, provides exemplary technical solutions to the technical problem of measuring and tracking a user's engagement with an electronic computing device.
In some embodiments, electronic computing device may be, without limitation, any electronic computing device at least includes and/or operationally associates with at least one another electronic computer device that includes at least one processor, a digital camera, and an disclosed software. For example, an exemplary electronic computing device may be at least one selected from the group of desktop, laptop, mobile device (e.g., tablet, smartphone, etc.), Internet-of-Things (IoT) device (e.g., smart thermostat), and the like. In some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to track one or more users' interactions with at least one exemplary electronic computing device as one or more users interact with the at least one exemplary electronic computing device and/or another electronic device (e.g., another electronic computing device).
In some embodiments, since the at least one exemplary electronic computing device may include at least one camera that acquires visual input related to the one or more users' activities, the exemplary disclosed software with the exemplary disclosed computer system are configured to detect and recognize, for example without limitation, at least one or more of the following: face pose, head pose, anthropometrics, facial expression(s), emotion(s), eye(s) and eye-gaze vector(s). In some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to estimate a type of activity each user is engaged in (e.g., reading text, watching video, surfing the Internet, etc.)
In some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to process input visual input (e.g., a set of portrait images) to perform at least one or more of the following:
In some embodiments, as detailed herein, the exemplary disclosed software with the exemplary disclosed computer system are configured to be applied, without limitation, for one or more of the following uses: working environment and/or information safety, advisory software for people who spend time using computers and electronic devices, parental control systems and other similar suitable computer-related activities and uses.
For example, regarding the estimation of the reading speed, in at least some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to determine how many lines per a time period (e.g., minute, etc.) a person (e.g., child) is reading at by, for example without limitation, using data to determine amplitude(s) per time period.
For example, regarding the determining the focus levels, in at least some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to analyze “stationary” patterns of the eye-gaze curves.
While a number of embodiments of the present disclosure have been described, it is understood that these embodiments are illustrative only, and not restrictive, and that many modifications may become apparent to those of ordinary skill in the art, including that various embodiments of the disclosed methodologies, the disclosed systems, and the disclosed devices described herein can be utilized in any combination with each other. Further still, the various steps may be carried out in any desired order (and any desired steps may be added and/or any desired steps may be eliminated).
At least some aspects of the present disclosure will now be described with reference to the following numbered clauses hereinafter designated as [C1, C2, C3, C4 . . . ]
C1: A computer-implemented method, comprising: continuously obtaining, by at least one processor, a visual input comprising a plurality of representations of at least one eye of at least one user to continuously track the plurality of representations over a predetermined time duration; wherein the visual input comprises a series of video frames, a series of images, or both; continuously applying, by the at least one processor, at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously inputting, by the at least one processor, the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the classified activity.
C2: The method of C1, wherein the visual input further comprises a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.
C3: The method of C2, further comprising, by the at least one processor, continuously applying to the visual input, at least one facial feature algorithm, wherein the at least one facial feature algorithm is chosen from at least one of: at least one face detection algorithm, at least one face tracking algorithm, at least one head pose estimation algorithm, at least one emotion recognition algorithm, or combinations thereof.
C4: The method of C3, wherein application of the at least one facial feature algorithm transforms the representation of the at least one additional facial feature of the at least one user into at least one additional facial feature vector associated with the at least one additional facial feature, wherein the at least one facial feature vector is chosen from: at least one face angle vector, at least one facial coordinate vector, or a combination thereof.
C5: The method of C4, further comprising, with the at least one processor, continuously obtaining a time series of additional facial feature vectors.
C6: The method of C5, wherein the plurality of representations comprises at least one eye movement of at least one user.
C7: The method of C1, wherein the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.
C8: The method of C1, wherein the predetermined time duration ranges from 1 to 300 minutes.
C9: The method of C1, wherein the at least one eye gaze vector comprises at least two reference points, the at least two reference points comprising: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.
C10: The method of C1, wherein the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors comprises at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.
C11. The method of C10, further comprising a step of, by the at least one processor, averaging the at least one first eye gaze vector and the at least one second eye gaze vector.
C12: The method of C1, wherein the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof.
C13: A system comprising: a camera component, wherein the camera component is configured to acquire a visual input, wherein the visual input comprises a real-time representation of at least one eye of at least one user and wherein the visual input comprises at least one video frame, at least one image, or both; at least one processor; a non-transitory computer memory, storing a computer program that, when executed by the at least one processor, causes the at least one processor to: continuously apply at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously input the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to determine an attentiveness level of the at least one user over the predetermined time duration to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the at least one classified activity.
C14: The system of C13, wherein the visual input further comprises a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.
C15: The system of C13, wherein the at least one processor continually applies to the visual input, at least one facial feature algorithm, wherein the at least one facial feature algorithm is chosen from at least one of: at least one face detection algorithm, at least one face tracking algorithm, at least one head pose estimation algorithm, at least one emotion recognition algorithm, or combinations thereof.
C16: The system of C15, wherein application of the at least one facial feature algorithm transforms the representation of the at least one additional facial feature of the at least one user into at least one additional facial feature vector associated with the at least one additional facial feature, wherein the at least one facial feature vector is chosen from: at least one face angle vector, at least one facial coordinate vector, or a combination thereof.
C17: The system of C16, wherein the at least one processor continually obtains a time series of additional facial feature vectors.
C18: The system of C13, wherein the plurality of representations comprises at least one eye movement of at least one user.
C19: The system of C13, wherein the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.
C20: The system of C13, wherein the predetermined time duration ranges from 1 to 300 minutes.
C21: The system of C13, wherein the at least one eye gaze vector comprises at least two reference points, the at least two reference points comprising: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.
C22: The system of C13, wherein the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors comprises at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.
C23: The system of C13, wherein the at least one processor averages the at least one first eye gaze vector and the at least one second eye gaze vector.
C24: the system of C13, wherein the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof.
Publications cited throughout this document are hereby incorporated by reference in their entirety.
While one or more embodiments of the present disclosure have been described, it is understood that these embodiments are illustrative only, and not restrictive, and that many modifications may become apparent to those of ordinary skill in the art, including that various embodiments of the disclosed methodologies, the disclosed systems/platforms, and the disclosed devices described herein can be utilized in any combination with each other. Further still, the various steps may be carried out in any desired order (and any desired steps may be added and/or any desired steps may be eliminated).
This application claims priority to U.S. Provisional Application No. 62/701,106, entitled “COMPUTER SYSTEMS AND COMPUTER-IMPLEMENTED METHODS CONFIGURED TO TRACK NUMEROUS USER-RELATED PARAMETERS DURING USERS' INTERACTION WITH ELECTRONIC COMPUTING DEVICES”, filed on Jul. 20, 2018, incorporated herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62701106 | Jul 2018 | US |