The subject matter described herein relates generally to autism detection and/or automated motor skills assessment. More particularly, the subject matter described herein includes methods, systems, and computer readable media for early detection of a neurodevelopmental or psychiatric disorder using scalable computational behavioral phenotyping and/or automated motor skills assessment.
Neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorders affect many people throughout the world. Current estimates indicate that 1 in 9 children may have or develop a neurodevelopmental and/or psychiatric disorder, such as an autism spectrum disorder (ASD), an anxiety disorder, or an attention deficient and hyperactivity disorder (ADHD). Research has shown that treatments for various behavioral disorders, including autism, can be more effective when diagnosed and treated early. Moreover, early intervention and consistent monitoring can be useful for tracking individual progress and may also be useful for understanding subjects in clinical trials. However, many children are not accurately screened and/or diagnosed as early as possible and/or do not receive adequate care after diagnosis. For example, the average age of autism diagnosis is close to 5 years old in the United States, yet autism may be diagnosed as early as 18 months. Current screening methods are less accurate when administered in real world settings, especially for girls and children of color. Current assessment techniques generally require trained clinicians and/or expensive equipment and can be very time intensive. Hence, current assessment techniques include barriers for early diagnosis and monitoring of many neurodevelopmental/psychiatric disorders.
This summary lists several embodiments of the presently disclosed subject matter, and in many cases lists variations and permutations of these embodiments. This summary is merely exemplary of the numerous and varied embodiments. Mention of one or more representative features of a given embodiment is likewise exemplary. Such an embodiment can typically exist with or without the feature(s) mentioned; likewise, those features can be applied to other embodiments of the presently disclosed subject matter, whether listed in this summary or not. To avoid excessive repetition, this summary does not list or suggest all possible combinations of such features.
The subject matter described herein includes methods, systems, and computer readable media for early detection of a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder using scalable computational behavioral phenotyping. In some embodiments, a method for early detection of a neurodevelopmental or psychiatric disorder using scalable computational behavioral phenotyping occurs at a computing platform including at least one processor and memory. The method includes obtaining user related information, wherein the user related information includes metrics derived from a user interacting with one or more applications executing on at least one user device; generating, using the user related information and a machine learning based model, a user assessment report including a prediction value indicating a likelihood that the user has a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder and a prediction confidence value computed using relative contributions of the metrics to the prediction value generated using the machine learning based model; and providing the user assessment report to a display or a data store.
A system for early detection of a neurodevelopmental or psychiatric disorder using scalable computational behavioral phenotyping is also disclosed. In some embodiments, the system includes a computing platform including at least one processor and memory. In some embodiments, the computing platform is configured for: obtaining user related information, wherein the user related information includes metrics derived from a user interacting with one or more applications executing on at least one user device; generating, using the user related information and a machine learning based model, a user assessment report including a prediction value indicating a likelihood that the user has a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder and a prediction confidence value computed using relative contributions of the metrics to the prediction value generated using the machine learning based model; and providing the user assessment report to a display or a data store. The assessment report also includes individualized descriptions of the user's unique behavioral profile that can be used to guide treatment planning.
The subject matter described herein includes methods, systems, and computer readable media for automated motor skills assessment. According to one method for automated motor skills assessment includes obtaining touch input data associated with a user using a touchscreen while the user plays a video game involving touching visual elements that move; analyzing the touch input data to generate motor skills assessment information associated with the user, wherein the motor skills assessment information indicates multiple touch related motor skills metrics; determining, using the motor skills assessment information, that the user exhibits behavior indicative of a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder; and providing, via a communications interface, the motor skills assessment information, a diagnosis, or related data.
A system for automated motor skills assessment is also disclosed. In some embodiments, the system includes a computing platform including at least one processor and memory, where the computing platform is configured for: obtaining touch input data associated with a user using a touchscreen while the user plays a video game involving touching visual elements that move; analyzing the touch input data to generate motor skills assessment information associated with the user, wherein the motor skills assessment information indicates multiple touch related motor skills metrics; determining, using the motor skills assessment information, that the user exhibits behavior indicative of a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder; and providing, via a communications interface, the motor skills assessment information, a diagnosis, or related data.
The subject matter described herein may be implemented in software in combination with hardware and/or firmware. For example, the subject matter described herein may be implemented in software executed by a processor (e.g., a hardware-based processor). In one example implementation, the subject matter described herein may be implemented using a non-transitory computer readable medium having stored thereon computer executable instructions that when executed by the processor of a computer control the computer to perform steps. Example computer readable media suitable for implementing aspects or portions of the subject matter described herein include non-transitory devices, such as disk memory devices, chip memory devices, programmable logic devices, such as field programmable gate arrays, and application specific integrated circuits. In addition, a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
As used herein, the term “node” refers to a physical computing platform including one or more processors and memory.
As used herein, the terms “function” or “module” refer to software in combination with hardware and/or firmware for implementing features described herein. In some embodiments, a module may include a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or a processor.
Although some of the aspects of the subject matter disclosed herein have been stated hereinabove and are achieved in whole or in part by the presently disclosed subject matter, other aspects will become evident as the description proceeds when taken in connection with the accompanying drawings as best described hereinbelow.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
The subject matter described herein will now be explained with reference to the accompanying drawings of which:
When the bubble is popped, it appears again (same cartoon character) from the bottom of the same lane, otherwise a random one appears after the bubble exits the screen from the top.
1 were defined from a time-series signal with varying amplitude having data points across time (columns, N ¼ 28) lying on a specific tolerance (r, represented as equally spaced dotted-lines/rows). The dotted circles indicate that the specific data point was missing in the original time-series. The negative natural logarithm was computed for the ratio between the number of matching (or repeating) templates associated with the m and m
1 dimensional vectors. For example, in the 2-dimensional (m¼2) vectors case, the template vectors denoted as · was repeated 2 times, # was repeated 3 times, * was repeated 4 times, and so on. In total, the repeated vector sequences was Cm¼16. Similarly, for the m
1-component vectors the number of repeated vector sequences was Cm
¼5. Then the SampEn for this example was —In(5/16). Please note that the embedding vectors having missing data points marked with a red rectangular area were not considered while estimating Cm and Cm
1. Instead, if we would have just concatenated the data points, the template 16 or 17 (without considering the missing value) would have matched with 25 and 26, thereby (artificially) increasing Cm without necessarily increasing Cm
1, or vice versa, leading to inaccurate results.
The subject matter described herein discloses methods, systems, and computer readable media for early detection of a neurodevelopmental or psychiatric disorder using scalable computational behavioral phenotyping and/or automated motor skills assessment. Autism is a neurodevelopmental condition associated with challenges in socialization and communication. Early detection of autism ensures timely access to intervention. Autism screening questionnaires have low accuracy when used in real world settings, such as primary care.
Behavioral signs of autism emerge between 9-18 months and include reduced attention to people, lack of response to name, differences in facial expressions, and motor delays. Commonly, children are screened for autism at the 18-24-month well-child visit using a parent questionnaire, which has shown to have lower accuracy in primary care settings, particularly for girls and children of color. There is a need for objective, scalable screening tools to increase the accuracy of autism screening with a goal of reducing disparities in access to early intervention and improving outcomes.
Eye tracking of social attention has been investigated as an objective early autism biomarker. An eye-tracking measure of social attention evaluated in 1,863 12-48-month-old children showed strong specificity (98.0%), but poor sensitivity (17.0%). The complex presentation of autism may be better captured by quantifying multiple autism-related behaviors. To this end, we developed an app which is administered on a tablet or smartphone and displays brief, strategically designed stimuli while the child's behavioral responses are recorded via the device's frontal camera and quantified via computer vision analysis (CVA) and machine learning (ML). Using the app, various computational approaches to measure individual autism-related behaviors, including social attention, facial expressions and dynamics, head movements, response to name, blink rate, and motor skills, can be utilized.
The subject matter described herein includes methods, systems, methods, or aspects related to using ML to create a novel algorithm that combines multiple behaviors and assesses the feasibility and accuracy of the app for fully automatic autism detection. For example, an algorithm, an app, a ML based model, or a related module in accordance with various aspects of the subject matter described herein can utilize novel methods for generating individualized user assessment reports. In this example, a user assessment report may quantify app administration quality (e.g., a quality score) and quantify the confidence of autism prediction (e.g., a prediction confidence score) for a given user (e.g., a child between 17 and 36 months old), and for explaining the user's unique digital phenotype profile.
The subject matter described herein includes results of a study assessing the accuracy of an autism screening digital application (e.g., a mobile app) administered during a pediatric well-child visit to 475 17-36-month-old children, 49 diagnosed with autism and 98 with developmental delay without autism. In the study, the app displayed stimuli designed for eliciting behavioral signs of autism, which were quantified using computer vision analysis (CVA) and machine learning (ML). In particular, up to twenty-three digital phenotypes (e.g., behavioral traits observed from user interaction with a digital app) based on CVA or touch quantified social attention, facial expressions and dynamics, blink rates, head movements, response to name, and motor skills (see
The subject matter described herein includes an ML algorithm (e.g., a trained model) for receiving as input a digital phenotype profile (e.g., observable behavioral traits represented by metrics observed or derived from user interaction with a digital app executing on a user device) and outputting diagnostic or predictive information. In a study described herein, the ML algorithm combining app derived traits (also referred to as app variables or app features) showed high diagnostic accuracy: area under the receiver operating characteristic curve (AUC)=0.90, sensitivity 87.8%, and specificity 80.8% distinguishing autism versus neurotypical children; AUC=0.86, sensitivity 81.6%, and specificity 80.5% distinguishing autism versus non-autism. Results demonstrate that digital phenotyping is an objective, scalable approach to autism screening in real-world settings.
Further, by providing techniques, mechanisms, and/or methods for early detection of a neurodevelopmental or psychiatric disorder using scalable computational behavioral phenotyping and automated motor skills assessment, diagnosis and/or treatment for various neurodevelopmental/psychiatric disorders (e.g., an autism spectrum disorder (ASD), an anxiety disorder, or an attention deficient and hyperactivity disorder (ADHD)) may be performed quickly and efficiently. Moreover, by providing automated user and motor skills assessments using a camera, a touchscreen, and/or software executing on mobile devices or other relatively inexpensive devices, cost barriers associated with diagnosis and/or treatment of neurodevelopmental/psychiatric disorders may be alleviated. Furthermore, using aspects of the present subject matter, diagnosis and/or treatment for various neurodevelopmental/psychiatric disorders in young children (e.g., ages 1-5) may be facilitated and/or improved over conventional methods, thereby allowing treatments, strategies, and/or intervention methods to be implemented more broadly and earlier than previously possible with conventional methods.
Additional details and example methods, mechanisms, techniques, and/or systems for early detection of autism or related aspects are further described in the EXAMPLES provided herein.
Computing Platform 100 may be any suitable entity (e.g., a mobile device or a server) configurable for generating user assessments using scalable computational behavioral phenotyping. For example, computer platform 100 may include a memory and at least one processor for executing a module (e.g., an app or other software) for early detection of neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder using scalable computational behavioral phenotyping. In this example, computer platform 100 may also include a user interface (e.g., a display or a touchscreen) for providing a video or a video game containing stimuli designed or usable to identify neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder in the user (e.g., a child, an adult, etc.) and a camera (e.g., a video camera) or other sensor(s) (e.g., touchscreen sensor(s)) for capturing user responses or behaviors (e.g., eye gaze, eye movements, visual and/or hand motor skills data (e.g., eye and/or hand movements during gameplay), facial expressions, head poses, head movements, eyebrow and mouth movements, responses to video-based stimuli, attention indicators to various stimuli, and/or responses to name being called). Continuing with this example, the module executing at computing platform 100 may generate and use metrics from the captured user interaction or related data in a prediction or diagnostic model (e.g., a trained machine learning based model that utilizes a multiple tree-based extreme gradient-boosting (XGBoost) algorithm). For example, the prediction model may be used in determining user assessment information and/or a related diagnosis (e.g., a diagnosis of a neurodevelopmental/psychiatric disorder or a related metric, such as a decimal value between 0 and 1 indicating the likelihood of a user having a particular neurodevelopmental/psychiatric disorder). See Chen et al. titled “XGBoost: A Scalable Tree Boosting System” (Proceedings of 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016: 785-94) for additional or example methodological details.
In some embodiments, computer platform 100 or a related module may include functionality for generating a user assessment report using a machine learning based model and a digital phenotype profile (e.g., data indicating behavioral traits obtained from or derived from user interaction with one or more apps executing on computer platform 100 or another device). In such embodiments, the user assessment report may include a prediction value related to a diagnosis (e.g., indicating the likelihood a user has a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder) generated by the model along with a prediction confidence value indicating a confidence or likelihood that the prediction value is accurate and a quality value indicating whether an assessment should be readministered (e.g., because the metrics (e.g., app-derived features) are poor quality or insufficient for an accurate assessment).
Computing platform 100 may include processor(s) 102. Processor(s) 102 may represent any suitable entity or entities (e.g., one or more hardware-based processor) for processing information and executing instructions or operations. Each of processor(s) 102 may be any type of processor, such as a central processor unit (CPU), a microprocessor, a multi-core processor, and the like. Computing platform 100 may further include a memory 106 for storing information and instructions to be executed by processor(s) 102.
In some embodiments, memory 106 can comprise one or more of random access memory (RAM), read only memory (ROM), static storage such as a magnetic or optical disk, or any other type of machine or non-transitory computer-readable medium. Computing platform 100 may further include one or more communications interface(s) 110, such as a network interface card or a communications device, configured to provide communications access to various entities (e.g., other computing platforms). In some embodiments, one or more communications interface(s) 110 may include a user interface configured for allowing a user (e.g., a diagnostic subject for assessment or an assessment operator) to interact with computing platform 100 or related entities. For example, a user interface may include a graphical user interface (GUI) for providing a questionnaire to user and/or for receiving input from the user and/or for displaying region-based stimuli to a user. In some embodiments, memory 106 may be utilized to store a user assessment module (UAM) 104, or software therein, and a UAM related storage 108.
UAM 104 may be any suitable entity (e.g., software executing on one or more processors) for performing one or more aspects associated with user assessment. In some embodiments, UAM 104 may be configured for early detection of a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder using scalable computational behavioral phenotyping. For example, UAM 104 may be configured for obtaining user related information, wherein the user related information includes metrics derived from a user interacting with one or more applications executing on at least one user device; generating, using the user related information and a machine learning based model, a user assessment report including a prediction value indicating a likelihood that the user has a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder and a prediction confidence value computed using relative contributions of the metrics to the prediction value generated using the machine learning based model; and providing the user assessment report to a display or a data store.
In some embodiments, UAM 104 or another entity may provide a diagnostic application (e.g., an autism screening digital application) for users (e.g., small children) that provide various stimuli in various forms (e.g., videos, games, audio, etc.). In such embodiments, UAM 104 or another entity may capture user responses and/or related data (e.g., recordings of the user and environment, touchscreen data, etc.) and may then generate or derive metrics from the user interaction (e.g., using various algorithms, techniques, methods) and use the metrics as input into a machine learning based algorithm (e.g., an XGBoost model) for determining a prediction value. In some embodiments, UAM 104 or another entity may use various techniques or algorithms (e.g., SHAP analysis, a local interpretable model-agnostic explanations (LIME) approach, a permutation importance approach, a feature importance approach, etc.) and related data to determine user-specific model interpretability data, e.g., a prediction confidence value and a quality value.
In some embodiments, computing platform 100 and/or UAM 104 may be communicatively coupled to one or more input or output (I/O) device(s), e.g., a camera, a touchscreen, a mouse, a keyboard, an input sensor, a display, etc. I/O device(s) 112 may represent any suitable entity (e.g., a camera sensor or camera chip in a smartphone) for providing data to a user (e.g., a display) or for obtaining data from or about the user (e.g., a camera for recording visual images or audio and/or a touchscreen for recording touch input). For example, I/O device(s) 112 may include a two-dimensional camera, a three dimensional camera, a heat-sensor camera, a touchscreen, touch sensors, etc. In some embodiments, I/O device(s) 112 may be usable for recording a user and user input during a user assessment (e.g., while the user is watching a video containing region-based stimuli or playing a video game).
In some embodiments, UAM 104 or another entity may quantify multiple digital phenotypes, e.g., behavioral traits or attributes, associated with a user (e.g., an ASD assessment subject). For example, UAM 104 or another entity may measure 19 CVA-based and 4 touch-based traits. Example digital phenotypes and related methods are discussed below. Additional information regarding the computation of these variables, missing data rate, and their pairwise correlation coefficients are further discussed EXAMPLE 2 below.
Facing forward: During social and non-social videos, UAM 104 or another entity may compute the average percentage of time that a user faced the screen. In some embodiments, filtering-in frames representing when a user is facing forward may be determined using three rules: eyes were open, estimated gaze was at or close to the screen area, and the face was relatively steady. For example, a digital phenotype or metric “Facing Forward” indicating the average percentage of time that a user faced the screen may be used as a proxy for the user's attention to the videos. See Chang et al. titled “Computational Methods to Measure Patterns of Gaze in Toddlers With Autism Spectrum Disorder” (JAMA Pediatr 2021; 175(8): 827-36) for additional or example methodological details.
Social attention: UAM 104 or another entity may display two videos featuring clearly separable social and non-social stimuli on each side of the screen, designed to capture social/non-social attentional preference. For example, a digital phenotype or metric “Gaze Percent Social” may be defined as the percentage of time the user gazed at the social half of the screen, and the “Gaze Silhouette Score” may reflect how concentrated versus spread out the detected gaze clusters were. See the Chang et al. reference for additional or example methodological details.
Attention to speech: UAM 104 or another entity may display a video with two actors, one on each side of the screen, taking turns in a conversation (see Video 1). For example, a digital phenotype or metric may be defined as the correlation between a user's gaze patterns and the alternating conversation. See the Chang et al. reference for additional methodological details.
Blink rate: During social and non-social videos, UAM 104 or another entity may compute a blink rate for a user. For example, a digital phenotype or metric “Blink Rate” may be used as a proxy to indicate attentional engagement and may be computed using CVA involving a recording of the user's eyes. See Babu et al. (2023) Blink rate and facial orientation reveal distinctive patterns of attentional engagement in autistic toddlers: a digital phenotyping approach. Scientific Reports 13(1): 7158 for methodological details
Facial dynamics complexity: UAM 104 or another entity may compute or estimate the complexity of facial landmarks' dynamics, e.g., by estimating the eyebrows and mouth regions of a user's face using multiscale entropy. For example, a digital phenotype or metric “Mouth Complexity” may be computed for indicating the average complexity of the mouth region during social and non-social videos and another digital phenotype or metric “Eyebrows Complexity” may be computed for indicating the average complexity of the eyebrows region during social and non-social videos. See Krishnappa Babu et al. titled “Exploring complexity of facial dynamics in autism spectrum disorder” (IEEE Trans Affect Comput 2021) for additional or example methodological details.
Head movement: UAM 104 or another entity may compute the rate of head movement (computed from a time series of detected facial landmarks) for social and non-social videos. For example, a digital phenotype or metric “Head Movement” may indicate the average head movement of a user during a video. In some embodiments, complexity and acceleration of head movements may be computed for both social stimuli and non-social stimuli using multiscale entropy and derivative of the time series, respectively. See Krishnappa Babu et al. titled “Complexity analysis of head movements in autistic toddlers” (J Child Psychol Psychiatry 2023; 64(1): 156-66) for additional or example methodological details.
Response to name: UAM 104 or another entity may perform automatic detection of a user's name being called and the user's response to their name, e.g., by using a recording and audio analysis to detect the name being called and CVA techniques and facial landmarks to detect head turns. For example, a digital phenotype or metric “Response to Name Proportion” indicating the proportion of times a user oriented to their name being called, and another digital phenotype or metric “Response to Name Delay” indicating the average delay (in seconds) between the offset of the name call and the head turn. See Perochon et al. titled “A scalable computational approach to assessing response to name in toddlers with autism” (J Child Psychol Psychiatry 2021; 62(9): 1120-31) for additional or example methodological details.
Touch-based visual-motor skills: UAM 104 or another entity may use touch and device kinetic information provided by a touchscreen or other sensors when a user's plays a video game, e.g., a bubble popping game, to quantify touch-based visual motor skills. For example, a digital phenotype or metric “Touch Popping Rate” associated with a bubble popping game may indicate the ratio of popped bubbles over the number of touches; another digital phenotype or metric “Touch Error Variation” associated with the bubble popping game may indicate the standard deviation of the distance between a user's finger position when touching the screen and the center of the closest bubble; another digital phenotype or metric “Touch Average Length” associated with the bubble popping game may indicate the average length of a user's finger trajectory on the screen, and another digital phenotype or metric “Touch Average Applied Force” associated with the bubble popping game may indicate the average estimated force applied on the screen when touching it. See Perochon et al. titled “A tablet-based game for the assessment of visual motor skills in autistic children” (NPJ Digit Med 2023; 6(1): 17) for additional or example methodological details.
In some embodiments, UAM 104 or a related entity (e.g., a medical provider) may administer to a user a therapy or therapies for treating a neurodevelopmental/psychiatric disorder. For example, after performing a user assessment and/or a related diagnosis of a neurodevelopmental/psychiatric disorder, UAM 104 may provide one or more training programs for treating or improving attention, social interaction skills, or motor skills in a user. In this example, the one or more training programs may be based on a number of factors, including user related factors, such as age, name, knowledge, skills, sex, medical history, and/or other information.
In some embodiments, UAM 104 may determine and/or provide user assessment information, a diagnosis, and/or related information (e.g., follow-up information and/or progress information) to one or more entities, such as a user, a system operator, a medical records system, a healthcare provider, a caregiver of the user, or any combination thereof. For example, user assessment information, a diagnosis, and/or related information may be provided via a phone call, a social networking message (e.g., Facebook or Twitter), an email, or a text message. In another example, user assessment information may be provided via an app and/or communications interface(s) 110. When provided via an app, user assessment information may include progress information associated with a user. For example, progress information associated with a user may indicate (e.g., to a caregiver or physician) whether certain therapies and/or strategies are improving or alleviating symptoms associated with a particular neurodevelopmental/psychiatric disorder. In another example, progress information may include aggregated information associated with multiple videos and/or assessment sessions.
Memory 106 may be any suitable entity or entities (e.g., non-transitory computer readable media) for storing various information. Memory 106 may include UAM related storage 108. UAM related storage 108 may be any suitable entity (e.g., a database embodied or stored in computer readable media) storing user data, stimuli (e.g., digital content, games, videos, or video segments), recorded or captured responses, and/or predetermined information. For example, UAM related storage 108 may include machine learning algorithms, algorithms for statistical analysis, SHAP analysis, and/or report generation logic. UAM related storage 108 may also include user data, such as age, name, knowledge, skills, sex, and/or medical history. UAM related storage 108 may also include predetermined information, including information gathered by clinical studies, patient and/or caregiver surveys, and/or doctor assessments.
In some embodiments, predetermined information may include information for analyzing responses; information for determining based responses; information for determining assessment thresholds; coping strategies; recommendations (e.g., for a caregiver or a child); treatment and/or related therapies, information for generating or selecting games, videos, video segments, digital content, or related stimuli usable for a user assessment; and/or other information.
In some embodiments, UAM related storage 108 or another entity may maintain associations between relevant health information and a given user or a given population (e.g., users with similar characteristics and/or within a similar geographical location). For example, users associated with different conditions and/or age groups may be associated with different recommendations, base responses, and/or assessment thresholds for indicating whether user responses are indicative of neurodevelopmental/psychiatric disorders.
In some embodiments, UAM related storage 108 may be accessible by UAM 104 and/or other modules of computing platform 100 and may be located externally to or integrated with UAM 104 and/or computing platform 100. For example, UAM related storage 108 may be stored at a server located remotely from a mobile device containing UAM 104 but still accessible by UAM 104. In another example, UAM related storage 108 may be distributed or separated across multiple nodes.
It will be appreciated that the above described modules or entities are for illustrative purposes and that features or portions of features described herein may be performed by different and/or additional modules, components, or nodes. For example, aspects of user assessment described herein may be performed by UAM 104, computing platform 100, and/or other modules or nodes.
In the study associated with
Based on the Youden Index, an algorithm integrating all app variables showed a high level of accuracy for classification of autism versus neurotypical development with AUC=0.90, CI [0.87-0.93], sensitivity 87.8% (SD=4.9), and specificity 80.8% (SD=2.3). Restricting to administrations with high prediction confidence, the AUC increased to 0.93 (CI [0.89-0.96]). Extended Data Table 2, EXAMPLE 1, and EXAMPLE 2 show all performance results based on individual and combined app variables. Classification of autism versus non-autism (DD-LD combined with neurotypical) also showed strong accuracy: AUC=0.86 (CI [0.83-0.90]), sensitivity 81.6% (SD=5.4), and specificity 80.5% (SD=1.8). Moreover, accuracy for predicting autism remained high when stratifying groups by sex, race, ethnicity, and age (see Extended Data Table 3, EXAMPLE 1, and EXAMPLE 2).
Nine autistic children scoring negative on the M-CHAT-R/F were correctly classified by the app as autistic as determined by expert evaluation. Among 40 children screening positive on the M-CHAT-R/F, there were 2 classified neurotypical based on expert evaluation and both were correctly classified by the app. Combining the app algorithm with the M-CHAT-R/F further increased classification performance to AUC=0.97 (CI [0.96-0.98]), specificity=91.8% (SD=4.5), and sensitivity=92.1% (SD=1.6).
To address model interpretability, SHAP values may be used to examine the relative contributions of the app variables to the model's prediction and disambiguate the contribution of each feature from their missingness (see EXAMPLE 2 for additional details).
Referring to
SHAP interaction values indicated that interactions between predictors were significant contributors to the model; average contribution of app variables alone was 64.6% (SD=3.4%) and 35.4% (SD=3.4%) for the feature interactions. Analysis of the missing data SHAP values revealed that missing variables were contributing to 5.2% (SD=13.2%) of the model predictions. See EXAMPLE 2 below for additional details. Analysis of the individual SHAP values revealed individual behavioral patterns that explained the model's prediction for each participant.
Results of the study depicted in
In some embodiments, methods for automatic assessment of the quality of the app administration and prediction confidence scores to facilitate the use of a mobile app in real world settings are described herein. For example, UAM 104 may provide an app usable for providing interactive content and can capture user interaction data with the app. In this example, UAM 104 may also use the data to generate metrics or related data that can be inputted into a trained machine learning based model to output a prediction or a diagnostic value. Continuing with this example, UAM 104 may include logic for assessing quality of the app administration and prediction confidence scores using SHAP analysis and then generate a user assessment comprising a quality score and a prediction confidence score. For example, using SHAP analyses, the app output can provide interpretable information regarding which behavioral features are contributing to the overall prediction model and each child's diagnostic prediction. The latter information could be used prescriptively to identify areas in which behavioral intervention should be targeted.
In some embodiments, a generated quality score may indicate whether the app assessment should be readministered. In this example, the quality score can be included with a prediction confidence score which can inform a provider about the degree of certainty regarding the likelihood a child will be diagnosed with autism. Children with uncertain values could be followed to determine whether autism signs become more pronounced, whereas children with high confidence values could be prioritized for referral or begin intervention while waiting for an evaluation.
In step 402, user related information may be obtained. In some embodiments, user related information may include metrics derived from a user interacting with one or more applications executing on at least one user device. For example, user related information may include a digital phenotype profile or related information indicating a user's behavioral traits or characteristics.
In step 404, a user assessment report may be generated using the user related information and a machine learning based model. In some embodiments, a user assessment report may include a prediction value indicating a likelihood that the user has a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder and a prediction confidence value computed using relative contributions of the metrics to the prediction value. For example, a user assessment report may be a customized report for indicating what user metrics (e.g., captured application variable data and/or scores) contributed to a user's diagnostic score (e.g., a prediction value) generated by a machine learning based model.
In step 406, the user assessment report may be provided to a display or a data store. For example, a user assessment report may be generated using UAM 104 or a machine learning based model therein and provided to a user (e.g., a subject, a parent, or a medical provider), e.g., via a display device and/or communications interface(s) 110 (e.g., a GUI). In another example, a user assessment report may be generated and stored in memory 106 or UAM related storage 108.
In some embodiments, process 400 may also include administering to the user a therapy for treating the neurodevelopmental/psychiatric disorder. For example, UAM 104 executing on a smartphone may display one or more therapeutic videos for improving a user's attention span for various types of stimuli (e.g., social stimuli), including coaching in strategies that caregivers can use at home to promote social and language skills. In another example, UAM 104 executing on a smartphone may provide interactive content or recommendations to improve a user's social interaction skills and/or motor skills. Another example may involve recommendations for coaching caregivers that promote specific skills, such as learning, communication, and social interaction.
In some embodiments, a user assessment report may include an assessment administration quality value. For example, an assessment administration quality value indicates whether a user assessment should be readministered or retaken. In another example, an assessment administration quality value may be computed based on user metrics (e.g., derived from user interaction with a diagnostic application) weighted by their relative contributions to the prediction value.
In some embodiments, computing a prediction confidence value may include performing a model interpretability analysis (e.g., a SHAP analysis, a LIME analysis, a permutation importance analysis, a feature importance analysis, etc.) involving metrics and a machine learning based model. For example, using SHAP or LIME analysis, UAM 104 or another entity may explain or interpret how various app variables (e.g., app-derived metrics) affect the model's behavior or output. In this example, using normalized SHAP interaction values, UAM 104 or another entity may identify the relative importance of each potential app variable to the model's output, e.g., at an overall level or population level. Continuing with this example, UAM 104 or another entity may also use normalized SHAP interaction values for the actual app variables obtained for a particular user (e.g., the actual app variables may be a subset of the potential app variables that can be obtained or derived) to determine how those particular app variables affected the user's particular prediction value generated by the model.
In some embodiments, performing a model interpretability analysis may include generating normalized SHAP interaction values associated with app-derived metrics and using the normalized interaction values in generating a user-specific prediction profile indicating how the user's metrics affected the user's diagnosis or prediction value. For example, SHAP value analysis may provide information about the relative contribution of each of the potential app-derived metrics to the prediction output (e.g., ASD or neurotypical) of a model (e.g., at a population level) and may also provide information usable when generating a user's unique profile indicating what specific metrics (e.g., the user's digital phenotype profile) and to what extent these metrics contributed to the user's diagnosis or prediction value. These metrics can be used for treatment planning and to monitor progress in treatment.
In some embodiments, a machine learning based model may include an XGBoost algorithm. For example, a trained XGBoost model may comprise or utilize multiple decision trees (e.g., 1,000 trees) that are trained using 5-fold cross-validation where the data is shuffled to compute individual intermediary binary predictions.
In some embodiments, obtaining user related information may include providing stimuli (e.g., interactive digital content, videos, video games, etc. for eliciting behavioral responses) to the user via a display, capturing user interaction data (e.g., recording of user face, eye, and/or posture, touchscreen input data, user selections, etc.) using one or more input devices (e.g., I/O device(s) 112), and generating metrics related to facial orientation, attention, social attention, facial expressions, head movements, eye movements, gaze, eyebrow movements, mouth movements, user responses to name, blink rate, hand motor skills, visual motor skills, or any combinations thereof.
In some embodiments, a neurodevelopmental/psychiatric disorder may an ASD, an ADHD, or an anxiety disorder diagnosis.
In some embodiments, user assessment and actionable guidance information or related data may be provided to a user, a medical records system, a service provider, a healthcare provider, a system operator, a caregiver of the user, or any combination thereof. For example, e.g., where information is provided to a clinician or a medical professional, a user assessment may include stimuli used in a test, recording of the user during the test, test results, and/or other technical or clinical information, such as recommendations for further assessment or treatment planning. In another example, e.g., where information is provided to a parent, a user assessment may include a metric associated with an easy to understand scale (e.g., 0-100%) for indicating the likelihood of a user (e.g., a child) having a particular neurodevelopmental/psychiatric disorder and useful suggestions for improving one or more related symptoms associated with neurodevelopmental/psychiatric disorder.
In some embodiments, computing platform 100 may include a mobile device, a smartphone, a tablet computer, a laptop computer, a computer, a user assessment device, or a medical device.
It will be appreciated that process 400 is for illustrative purposes and that different and/or additional actions may be used. It will also be appreciated that various actions described herein may occur in a different order or sequence.
It should be noted that computing platform 100, UAM 104, and/or functionality described herein may constitute a special purpose computing device. Further, computing platform 100, UAM 104, and/or functionality described herein can improve the technological field of diagnosing and treating various neurodevelopmental/psychiatric disorders by providing mechanisms for early detection of neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder using scalable computational behavioral phenotyping and/or for providing a user assessment report indicating a prediction value and a prediction confidence value. Moreover, such mechanisms can alleviate various barriers, including costs, equipment, and human expertise, associated with conventional (e.g., clinical) methods of diagnosis and treatment of neurodevelopmental/psychiatric disorders.
The subject matter described herein for early detection of a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder using scalable computational behavioral phenotyping improves the functionality of user assessment devices and equipment by providing mechanisms (e.g., a user assessment algorithm or a machine learning based algorithm) that generates a user assessment regarding the likelihood of an ASD diagnosis using user related information (e.g., a digital phenotype profile comprising data obtained or derived from user interaction with one or more applications executing on a user device).
It should also be noted that computing platform 100 that implements subject matter described herein may comprise a special purpose computing device usable for various aspects of user assessments, including obtaining metrics associated with a user's digital phenotype profile, generating and using a diagnostic or prediction model using the metrics or other data to output a prediction or a diagnosis associated with the user; and generating a user assessment report including a prediction value indicating a likelihood that the user has neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder or risk for such a disorder and a prediction confidence value and/or an assessment administration quality value. A prediction confidence value can be used to prioritize assessment and therapy services in the context of long wait-lists for such services.
Computing platform 500 may include processor(s) 502. Processor(s) 502 may represent any suitable entity or entities (e.g., one or more hardware-based processor) for processing information and executing instructions or operations. Each of processor(s) 502 may be any type of processor, such as a central processor unit (CPU), a microprocessor, a multi-core processor, and the like. Computing platform 500 may further include a memory 506 for storing information and instructions to be executed by processor(s) 502.
In some embodiments, memory 506 can comprise one or more of random access memory (RAM), read only memory (ROM), static storage such as a magnetic or optical disk, or any other type of machine or non-transitory computer-readable medium. Computing platform 500 may further include one or more communications interface(s) 510, such as a network interface card or a communications device, configured to provide communications access to various entities (e.g., other computing platforms). In some embodiments, one or more communications interface(s) 510 may include a user interface configured for allowing a user (e.g., a diagnostic subject for assessment or an assessment operator) to interact with computing platform 500 or related entities. For example, a user interface may include a graphical user interface (GUI) for providing an interactive game or content to the user. In some embodiments, memory 506 may be utilized to store a motor skills assessment module (MSAM) 504, or software therein, and a MSAM related storage 508.
MSAM 504 may be any suitable entity (e.g., software executing on one or more processors) for performing one or more aspects associated with automated motor skills assessments. For example, MSAM 504 may be configured for obtaining touch input data associated with a user using a touchscreen while the user plays a video game involving touching visual elements that move; analyzing the touch input data to generate motor skills assessment information associated with the user, wherein the motor skills assessment information indicates multiple touch related motor skills metrics; determining, using the motor skills assessment information, that the user exhibits behavior indicative of a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder; and providing, via a communications interface, the motor skills assessment information, a diagnosis, or related data.
In some embodiments, MSAM 504 or another entity may provide a diagnostic application (e.g., an autism screening digital application) for users (e.g., small children) that provide various stimuli in various forms (e.g., videos, games, audio, etc.). In such embodiments, MSAM 504 or another entity may capture user responses and/or related data (e.g., recordings of the user and environment, touchscreen data, etc.) and may then generate or derive metrics from the user interaction (e.g., using various algorithms, techniques, methods) and use the metrics (e.g., as input to a trained machine learning based model) to perform a motor skills assessment or a diagnosis of a neurodevelopmental/psychiatric disorder.
In some embodiments, computing platform 500 and/or MSAM 504 may be communicatively coupled to one or more input or output (I/O) device(s), e.g., a camera, a touchscreen, a mouse, a keyboard, an input sensor, a display, etc. I/O device(s) 512 may represent any suitable entity (e.g., a camera sensor or camera chip in a smartphone) for providing data to a user (e.g., a display) or for obtaining data from or about the user (e.g., a camera for recording visual images or audio and/or a touchscreen for recording touch input). For example, I/O device(s) 512 may include a two-dimensional camera, a three dimensional camera, a heat-sensor camera, a touchscreen, touch sensors, etc. In some embodiments, I/O device(s) 512 may be usable for recording a user and user input during a motor skills assessment (e.g., while the user is playing a video game).
MSAM 504 or another entity may use touch and device kinetic information provided by a touchscreen or other sensors when a user's plays a video game, e.g., a bubble popping game, to quantify touch-based visual motor skills. For example, a digital phenotype or metric “Touch Popping Rate” associated with a bubble popping game may indicate the ratio of popped bubbles over the number of touches; another digital phenotype or metric “Touch Error Variation” associated with the bubble popping game may indicate the standard deviation of the distance between a user's finger position when touching the screen and the center of the closest bubble; another digital phenotype or metric “Touch Average Length” associated with the bubble popping game may indicate the average length of a user's finger trajectory on the screen, and another digital phenotype or metric “Touch Average Applied Force” associated with the bubble popping game may indicate the average estimated force applied on the screen when touching it. See Perochon et al. titled “A tablet-based game for the assessment of visual motor skills in autistic children” (NPJ Digit Med 2023; 6(1): 17) for additional or example methodological details; the disclosure of which is part of the instant specification and is incorporated herein by reference in its entirety.
In some embodiments, MSAM 504 or a related entity (e.g., a medical provider) may administer to a user a therapy or therapies for treating a neurodevelopmental/psychiatric disorder. For example, after performing a motor skills assessment and/or a related diagnosis of a neurodevelopmental/psychiatric disorder, MSAM 504 may provide recommendations or one or more training programs for treating or improving the motor skills of a user. In this example, the recommendations or training programs may be based on a number of factors, including user related factors, such as age, name, knowledge, skills, sex, medical history, and/or other information.
In some embodiments, MSAM 504 may determine and/or provide motor skills assessment information, a diagnosis, and/or related information (e.g., follow-up information and/or progress information) to one or more entities, such as a user, a system operator, a medical records system, a healthcare provider, a caregiver of the user, or any combination thereof. For example, motor skills assessment information, screening results, and/or related information may be provided via a phone call, a social networking message (e.g., Facebook or Twitter), an email, or a text message. In another example, motor skills assessment information may be provided via an app and/or communications interface(s) 510. When provided via an app, motor skills assessment information may include progress information associated with a user. For example, progress information associated with a user may indicate (e.g., to a caregiver or physician) whether certain therapies and/or strategies are improving or alleviating symptoms associated with a particular neurodevelopmental/psychiatric disorder. In another example, progress information may include aggregated information associated with multiple videos and/or assessment sessions.
Memory 506 may be any suitable entity or entities (e.g., non-transitory computer readable media) for storing various information. Memory 506 may include MSAM related storage 508. MSAM related storage 508 may be any suitable entity (e.g., a database embodied or stored in computer readable media) storing user data, stimuli (e.g., digital content, games, etc.), recorded or captured user input, and/or predetermined information. For example, MSAM related storage 508 may include machine learning algorithms, algorithms for statistical analysis, and/or report generation logic. MSAM related storage 508 may also include user data, such as age, name, knowledge, skills, sex, and/or medical history. MSAM related storage 508 may also include predetermined information, including information gathered by clinical studies, patient and/or caregiver surveys, and/or doctor assessments.
In some embodiments, predetermined information may include information for analyzing responses; information for determining based responses; information for determining assessment thresholds; coping strategies; recommendations (e.g., for a caregiver or a child); treatment and/or related therapies, information for generating or selecting games, digital content, or related stimuli usable for performing a motor skills assessment; and/or other information.
In some embodiments, MSAM related storage 508 or another entity may maintain associations between relevant health information and a given user or a given population (e.g., users with a same condition, users with similar characteristics and/or within a similar geographical location). For example, users associated with different conditions and/or age groups may be associated with different recommendations, base responses, and/or assessment thresholds for indicating whether user responses are indicative of neurodevelopmental/psychiatric disorders.
In some embodiments, MSAM related storage 508 may be accessible by MSAM 504 and/or other modules of computing platform 500 and may be located externally to or integrated with MSAM 504 and/or computing platform 500. For example, MSAM related storage 508 may be stored at a server located remotely from a mobile device containing MSAM 504 but still accessible by MSAM 504. In another example, MSAM related storage 508 may be distributed or separated across multiple nodes.
It will be appreciated that the above described modules or entities are for illustrative purposes and that features or portions of features described herein may be performed by different and/or additional modules, components, or nodes. For example, aspects of motor skills assessment described herein may be performed by MSAM 504, computing platform 500, and/or other modules or nodes.
In step 602, touch input data associated with a user using a touchscreen may be obtained while the user plays a video game involving touching visual elements that move.
In step 604, the touch input data may be analyzed to generate motor skills assessment information associated with the user, where the motor skills assessment information indicates multiple touch related motor skills metrics.
In step 606, it may be determined, using the motor skills assessment information, that the user exhibits behavior indicative of a neurodevelopmental or psychiatric (neurodevelopmental/psychiatric) disorder.
In step 608, the motor skills assessment information, a diagnosis, or related data may be provided via a communications interface.
In some embodiments, process 600 may also include administering to the user a therapy for treating the neurodevelopmental/psychiatric disorder. In some embodiments, an administered therapy may include recommendations for improving motor skills or an interactive or dynamic game or digital content for improving motor skills over time. For example, MSAM 504 executing on a smartphone may provide a video game (e.g., similar to the video game used in a motor skills assessment or one configured based on the user's assessment score) to improve motor skills. In this example, MSAM 504 or another entity may monitor changes in motor skills over time (e.g., by monitoring game progress), and may use the gathered information to inform therapeutic strategies to improve motor skills (e.g., by changing game parameters, a skill level, or other settings over time). In another example, MSAM 504 executing on a smartphone may provide interactive content or recommendations to improve a user's social interaction skills and/or motor skills.
In some embodiments, the touch related motor skills metrics may relate to number of touches, number of misses, number of pops, a popping rate, touch duration, applied force, length of touch motion, number of touch per target, time spent targeting a visual element, touch frequency, touch velocity, popping accuracy, repeat percentage, distance to the center of a visual element, or number of transitions.
In some embodiments, determining, using the motor skills assessment information, that a user exhibits behavior indicative of a neurodevelopmental/psychiatric disorder may include comparing the motor skills assessment information to information from a population having the neurodevelopmental/psychiatric disorder.
In some embodiments, determining, using the motor skills assessment information, that a user exhibits behavior indicative of a neurodevelopmental/psychiatric disorder may include using the motor skills assessment information as input for a trained machine learning algorithm or model that outputs diagnostic or predictive information regarding the likelihood of the user having the neurodevelopmental/psychiatric disorder.
In some embodiments, a trained machine learning algorithm or model may take as input metrics related to digital phenotyping involving the user, e.g., metrics relate to gaze patterns, social attention, facial expressions, facial dynamics, or postural control.
In some embodiments, a neurodevelopmental/psychiatric disorder may an ASD, language delay, motor delay, intellectual or developmental delay, ADHD, an anxiety disorder diagnosis, or any combination thereof.
In some embodiments, motor skills assessment information or related data may be provided to a user, a medical records system, a service provider, a healthcare provider, a system operator, a caregiver of the user, or any combination thereof. For example, e.g., where information is provided to a clinician or a medical professional, motor skills assessment information may include stimuli used in a test, captured touch input data during the test, test results, and/or other technical or clinical information. In another example, e.g., where information is provided to a parent, a motor skills assessment report may include a metric associated with an easy to understand scale (e.g., 0-100%) for indicating the likelihood of a user (e.g., a child) having a particular neurodevelopmental/psychiatric disorder and useful suggestions for improving one or more related symptoms associated with neurodevelopmental/psychiatric disorder.
In some embodiments, computing platform 500 may include a mobile device, a smartphone, a tablet computer, a laptop computer, a computer, a motor skills assessment device, or a medical device.
It will be appreciated that process 600 is for illustrative purposes and that different and/or additional actions may be used. It will also be appreciated that various actions described herein may occur in a different order or sequence.
It should be noted that computing platform 500, MSAM 504, and/or functionality described herein may constitute a special purpose computing device. Further, computing platform 500, MSAM 504, and/or functionality described herein can improve the technological field of diagnosing and treating various neurodevelopmental/psychiatric disorders by providing mechanisms for automated motor skills assessment using touch input data and related metrics (e.g., obtained or derived from user interaction with a touchscreen during a video game). Moreover, such mechanisms can alleviate many barriers, including costs, equipment, and human expertise, associated with conventional (e.g., clinical) methods of diagnosis and treatment of neurodevelopmental/psychiatric disorders.
The subject matter described herein for automated motor skills assessment improves the functionality of motor skills assessment devices and equipment by providing mechanisms that analyze a user's touch input data obtained or derived from a touchscreen to generate touch related motor skills metrics, use the metrics to determine motor skills assessment information for the user, and determine, using the motor skills assessment information, that the user exhibits or does not exhibit behavior indicative of a neurodevelopmental/psychiatric disorder. It should also be noted that computing platform 500 that implements subject matter described herein may comprise a special purpose computing device usable for various aspects of motor skills assessments, including videos containing region-based stimuli and/or gaze analysis.
Additional details and example methods, mechanisms, techniques, and/or systems for early detection of autism or related aspects are further described in the Examples herein below entitled “A tablet-based game for the assessment of visual motor skills in autistic children,” “Exploring Complexity of Facial Dynamics in Autism Spectrum Disorder,” “Complexity analysis of head movements in autistic toddlers,” and “Blink rate and facial orientation reveal distinctive patterns of attentional engagement in autistic toddlers: a digital phenotyping approach,” and additional Examples.
The presently disclosed subject matter will be now be described more fully hereinafter with reference to the accompanying EXAMPLES, in which representative embodiments of the presently disclosed subject matter are shown. The presently disclosed subject matter can, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the presently disclosed subject matter to those skilled in the art.
Autism is a neurodevelopmental condition associated with challenges in socialization and communication. Early detection of autism ensures timely access to intervention. Autism screening questionnaires have lower accuracy when used in primary care. Here we report the results of a prospective study assessing the accuracy of an autism screening digital application (app) administered during a pediatric well-child visit to 475 17-36-month-old children, 49 diagnosed with autism and 98 with developmental delay without autism. The app displayed stimuli that elicited behavioral signs of autism, which were quantified using computer vision and machine learning. An algorithm combining multiple digital phenotypes showed high diagnostic accuracy: Area under the receiver operating characteristic curve (AUC)=0.90, sensitivity 87.8%, and specificity 80.8% distinguishing autism versus neurotypical children; AUC=0.86, sensitivity 81.6%, and specificity 80.5% distinguishing autism versus non-autism. Results demonstrate that digital phenotyping is an objective, scalable approach to autism screening in real-world settings.
Autism spectrum disorder (henceforth “autism”) is a neurodevelopmental condition associated with qualitative challenges in social communication abilities and the presence of restricted and repetitive behaviors. Autism signs emerge between 9-18 months and include reduced attention to people, lack of response to name, differences in affective engagement and expressions, and motor delays, among other features.1 Commonly, children are screened for autism at their 18-24 well-child visits using a parent questionnaire, the Modified Checklist for Autism in Toddlers-Revised with Follow Up (M-CHAT-R/F).2 The M-CHAT-R/F has been shown to have higher accuracy in research settings3 compared to real-world settings, such as primary care, particularly for girls and children of color.4-7 This is, in part, due to low rates of completion of the follow-up interview by pediatricians.8 A study of >25,000 children screened in primary care found that the M-CHAT/F's specificity was high (95.0%) but sensitivity was poor (39.0%).6 Thus, there is a need for accurate, objective, and scalable autism screening tools to increase the accuracy of autism screening and reduce disparities in access to early diagnosis and intervention, which can improve outcomes.9
A promising screening approach is the use of eye-tracking technology to measure children's attentional preferences for social versus non-social stimuli.10 Autism is characterized by reduced spontaneous visual attention to social stimuli.10 Studies of preschool and school-age children using machine learning (ML) of eye-tracking data reported encouraging findings for the use of eye tracking for distinguishing autistic and neurotypical children.11,12 However, because autism has a heterogeneous presentation involving multiple behavioral signs, eye-tracking tests alone may be insufficient as an autism screening tool. An eye-tracking screening measure of social attention evaluated in 1,863 12-48-month-old children had strong specificity (98.0%) but poor sensitivity (17.0%). The authors conclude that the eye-tracking task is useful for detecting a subtype of autism.13
By quantifying multiple autism-related behaviors, it may be possible to better capture the complex presentation of autism reflected in current diagnostic assessments. Digital phenotyping can detect differences between autistic and neurotypical children in social attention, head movements, facial expressions, and motor behaviors.14-18 We developed an application (app), SenseToKnow, which is administered on a tablet and displays brief, strategically designed movies while the child's behavioral responses are recorded via the frontal camera embedded in the device and quantified via computer vision analysis (CVA) and ML. The app elicits and measures a wide range of autism-related behaviors, including social attention, facial expressions and dynamics, head movements, response to name, blink rate, and motor skills (see
Quality and prediction confidence scores. Quality scores were automatically computed for each app administration, which reflected the amount of available app variables weighted by their predictive power. Quality scores were found to be high (median score=93.9%, Q1-Q3 [90.0%-98.4%]), with no diagnostic group differences. A prediction confidence score for accurately classifying an individual child was also calculated. At the 20% threshold, 311/377 administrations were rated high confidence (see also EXAMPLE 2 for details).
Diagnostic accuracy of SenseToKnow for the detection of autism. Using all app variables, we trained a model comprised of K=1000 tree-based extreme gradient-boosting algorithms (XGBoost) to classify diagnostic groups.26
Based on the Youden Index,27 an algorithm integrating all app variables showed a high level of accuracy for classification of autism versus neurotypical development with AUC=0.90, CI [0.87-0.93], sensitivity 87.8% (SD=4.9), and specificity 80.8% (SD=2.3). Restricting administrations to those with high prediction confidence, the AUC increased to 0.93 (CI [0.89-0.96]). Table 2 shows performance results for autism versus neurotypical group classification based on individual and combined app variables. Classification of autism versus non-autism (DD-LD combined with neurotypical) also showed strong accuracy: AUC=0.86 (CI [0.83-0.90]), sensitivity 81.6% (SD=5.4), and specificity 80.5% (SD=1.8).
Nine autistic children who scored negative on the M-CHAT-R/F were correctly classified by the app as autistic as determined by expert evaluation. Among 40 children screening positive on the M-CHAT-R/F, there were 2 classified neurotypical based on expert evaluation and both were correctly classified by the app. Combining the app algorithm with the M-CHAT-R/F further increased classification performance to AUC=0.97 (CI [0.96-0.98]), specificity=91.8% (SD=4.5), and sensitivity=92.1% (SD=1.6).
Diagnostic accuracy of SenseToKnow for subgroups based on sex, race, and ethnicity. Classification performance of the app based on AUCs remained consistent when stratifying groups by sex (AUC for girls=89.1(CI [82.6-95.6]), and for boys AUC=89.6 (CI [86.2-93.0])), as well as race, ethnicity, and age. Table 3 provides exhaustive results for all subgroups. However, confidence intervals were larger due to smaller sample sizes for subgroups.
Model interpretability. Distributions for each app variable for autistic and neurotypical participants are shown in
SHAP interaction values indicated that interactions between predictors were significant contributors to the model; average contribution of app variables alone was 64.6% (SD=3.4%) and 35.4% (SD=3.4%) for the feature interactions. Analysis of the missing data SHAP values revealed that missing variables were contributing to 5.2% (SD=13.2%) of the model predictions. See EXAMPLE 2 for details.
Individualized interpretability. Analysis of the individual SHAP values revealed individual behavioral patterns that explained the model's prediction for each participant.
When used in primary care, the accuracy of autism screening parent questionnaires has been found to be lower than in research contexts, especially for children of color and girls, which can increase disparities in access to early diagnosis and intervention. Studies using eye-tracking of social attention alone as an objective, quantitative index of autism have reported inadequate sensitivity, perhaps because assessments based on only one autism feature (differences in social attention) do not adequately capture the complex and heterogenous clinical presentation of autism.
We evaluated the accuracy of an ML and CVA-based algorithm using multiple autism-related digital phenotypes assessed via a mobile app (SenseToKnow) administered on a tablet in primary care settings for identification of autism in a large sample of toddler-age children, the age at which screening is routinely conducted. The app captured the wide range of early signs associated with autism, including differences in social attention, facial expressions, head movements, response to name, blink rates, and motor skills, and was robust to missing data. ML allowed optimization of the prediction algorithm based on weighting different behavioral variables and their interactions. We demonstrated high levels of usability and diagnostic accuracy for classification of autistic versus neurotypical children and autistic versus non-autistic children (neurotypical and other developmental/language delays). The accuracy of SenseToKnow for detecting autism did not differ based on the child's sex, race, or ethnicity, suggesting that an objective digital screening approach that relies on direct quantitative observations of multiple behaviors may improve autism screening in diverse populations.
We developed methods for automatic assessment of the quality of the app administration and prediction confidence scores, both of which could facilitate the use of SenseToKnow in real world settings. The quality score provides a simple, actionable means of determining whether the app should be re-administered. This can be combined with a prediction confidence score which can inform a provider about the degree of certainty regarding the likelihood a child will be diagnosed with autism. Children with uncertain values could be followed to determine whether autism signs become more pronounced, whereas children with high confidence values could be prioritized for referral or begin intervention while the parent waits for their child to be evaluated. Using SHAP analyses, the app output provides interpretable information regarding which behavioral features are contributing to the diagnostic prediction for an individual child. Such information could be used prescriptively to identify areas in which behavioral intervention should be targeted. Notably, the app quantifies autism signs related to social attention, facial expressions, response to language cues, and motor skills, but does not capture behaviors in the restricted and repetitive behavior domain.
In the context of an overall pathway for autism diagnosis, our vision is that autism screening in primary care should be based on integrating multiple sources of information, including screening questionnaires based on parent report questionnaires and digital screening based on direct behavioral observation. Recent work suggests that ML analysis of a child's healthcare utilization patterns using data passively derived from the electronic health record could also be useful for early autism prediction.29 Results of the present study support this multimodal screening approach. A large study conducted in primary care found that the PPV of the M-CHAT/F was 14.6% and was lower for girls and children of color.6 In comparison, the PPV of the app in the present study was 40.6%, and the app performed similarly across children of different sex, race, and ethnicity. Furthermore, combining the M-CHAT-R/F with digital screening resulted in an increased PPV of 63.4%. Thus, our results suggest that a digital phenotyping approach will improve the accuracy of autism screening.
A possible limitation of the present study includes possible validation bias given that it was not feasible to conduct a comprehensive diagnostic evaluation on participants considered neurotypical. This was mitigated by the fact that diagnosticians were naïve with respect to the app results. The percentage of autism versus non-autism cases in this study is higher than in the general population, raising the potential for sampling bias. It is possible that parents who had developmental concerns about their child were more likely to enroll the child in the study. Although prevalence bias is addressed statistically by calibrating the performance metrics to the population prevalence of autism, this remains a possible limitation of the study. Accuracy assessments potentially could have been inflated due to differences in language abilities between the autism and developmental delay groups, although the two groups had similar nonverbal abilities. Future studies evaluate the app's performance in an independent sample with children of different ages and language and cognitive abilities. Strengths of this study include the sample diversity, evaluation of the app in a real-world setting at an age at which autism screening is routinely conducted, and following children through age 4 years to determine final diagnosis.
We conclude that quantitative, objective, and scalable digital phenotyping offers promise in increasing the accuracy of autism screening and reducing disparities in access to diagnosis and intervention, complementing existing autism screening questionnaires. While we believe that this study represents a significant step forward in developing improved autism screening tools, accurate use of these screening tools requires training and systematic implementation by primary providers, and a positive screen must then be linked to appropriate referrals and services. Each of these touch points along the clinical care pathway contributes to the quality of early autism identification and can impact timely access to interventions and services than can influence long-term outcomes.
Study cohort. The study was conducted from December 2018 to March 2020 (Pro00085434). Participants were 475 children, 17-36 months, who were consecutively enrolled at one of four Duke University Health System (DUHS) pediatric primary care clinics during their well-child visit. Inclusion criteria were age 16-38 months, not ill, and caregiver's language was English or Spanish. Exclusion criteria were sensory or motor impairment that precluded sitting or viewing the app, unavailable clinical data, and child too upset at their well-child visit. Table 1 describes sample demographic and clinical characteristics.
754 participants were approached and invited to participate, 214 declined participation and 475 (93% of enrolled participants) completed study measures. All parents or legal guardians provided written informed consent, and the study protocol was approved by the Duke University Health System Institutional Review Board.
Diagnostic classification. Children were administered the M-CHAT-R/F,2 a parent survey querying different autism signs. Children with a final M-CHAT-R/F score of >2 or whose parents and/or provider expressed any developmental concern were provided a gold standard autism diagnostic evaluation based on the Autism Diagnostic Observation Schedule-Second Edition (ADOS-2),29 DSM-5 criteria checklist, and Mullen Scales of Early Learning,30 conducted by a licensed, research-reliable psychologist who was blind with respect to app results. Mean duration between app screening and evaluation=3.5 months, which is a similar or shorter duration compared to real-world settings. Diagnosis of autism spectrum disorder required meeting full DSM-5 diagnostic criteria. Diagnosis of developmental or language delay without autism (DD-LD) was defined as failing the M-CHAT-R/F and/or having provider or parent concerns and having been administered the ADOS-2 and Mullen Scales and determined by the psychologist not to meet diagnostic criteria for autism and exhibiting developmental and/or language delay based on the Mullen Scales (scoring>9 points below the mean on at least one Mullen Scales subscale; SD=10).
In addition, each participant's DUHS electronic health record (EHR) was monitored through age 4 years to confirm whether the child subsequently received a diagnosis of either autism spectrum disorder or DD-LD. Following validated methods used by Guthrie et al., children were classified as autistic or DD-LD based on their EHR record if an ICD-9/10 diagnostic code for autism spectrum disorder or DD-LD (without autism) appeared more than once or was provided by an autism specialty clinic.6 If a child did not have an elevated M-CHAT-R/F score, no developmental concerns were raised by the provider or parents, and there were no autism or DD-LD diagnostic codes in the EHR through age four, they were considered neurotypical. There were 2 children classified as neurotypical who scored positive on the M-CHAT-R/F who were considered neurotypical based on expert diagnostic evaluation and had no autism or DD-LD EHR diagnostic codes.
Based on these procedures, 49 children were diagnosed with autism spectrum disorder (6 based on EHR only), 98 children were diagnosed DD-LD without autism (78 based on EHR only), and 328 children were considered neurotypical. Diagnosis of autism or developmental delay was made naïve to app results.
SenseToKnow app. Parents held their child on their lap while brief, engaging movies were presented on an iPad set on a tripod approximately 60 cm away from the child. Parents were asked to refrain from talking during the movies. The frontal camera embedded in the device recorded the child's behavior at resolutions of 1280×720, 30 frames per second. While children were watching the movies, their name was called three times by an examiner standing behind them at pre-defined timestamps. The children then participated in a “Bubble Popping” game using their finger to pop a set of colored bubbles that moved continuously across the screen. App completion took <10 minutes. English and Spanish versions were shown. Additional details can be found in EXAMPLE 2.
App variables. CVA and ML was used to quantify multiple digital phenotypes. Detailed information regarding the identification and recognition of the child's face and the estimation of the frame-wise facial landmarks, head pose, and gaze has been described previously.19 Several CVA-based and touch-based behavioral variables were computed, described next.
Facing forward. During the social and non-social movies, we computed the average percentage of time the children faced the screen, filtering-in frames using three rules: eyes were open, estimated gaze was at or close to the screen area, and the face was relatively steady, referred to as Facing Forward. This variable was used as a proxy for the child's attention to the movies. See Chang et al. for methodological details.19
Social attention. The app includes two movies featuring clearly separable social and non-social stimuli on each side of the screen, designed to capture social/non-social attentional preference. The variable Gaze Percent Social was defined as the percentage of time the child gazed at the social half of the screen, and the Gaze Silhouette Score reflected how concentrated versus spread out the gaze clusters were. See Chang et al. for methodological details.19
Attention to speech. One of the movies features two actors, one on each side of the screen, taking turns in a conversation. We computed the correlation between the child's gaze patterns and the alternating conversation, defined as the Gaze Speech Correlation variable. See Chang et al. for methodological details.19
Facial dynamics complexity. The complexity of the facial landmarks' dynamics was estimated for the eyebrows and mouth regions of the child's face using multiscale entropy. We computed the average complexity of the mouth and eyebrows regions during social and non-social movies, referred to as the Mouth Complexity and Eyebrows Complexity. See Krishnappa Babu et al. for methodological details.20
Head movement. We evaluated the rate of head movement (computed from the time series of the facial landmarks) for social and non-social movies. Average head movement was referred to as Head Movement. Complexity and acceleration of the head movements were computed for both types of stimuli using multiscale entropy and derivative of the time series, respectively. See Krishnappa Babu et al. for methodological details.22
Response to name. Based on automatic detection of the name calls and the child's response to their name by turning their head computed from the facial landmarks, we defined two CVA-based variables: Response to Name Proportion, representing the proportion of times the child oriented to the name call, and Response to Name Delay, the average delay (in seconds) between the offset of the name call and head turn. See Perochon et al. for methodological details.23
Blink rate. During the social and non-social movies, CVA was used to extract the blink rates as indices of attentional engagement, referred to as Blink Rate. See Krishnappa Babu et al. for methodological details.24
Touch-based visual-motor skills. Using the touch and device kinetic information provided by the device sensors when the child played the bubble popping game (see EXAMPLE 2), we defined Touch Popping Rate as the ratio of popped bubbles over the number of touches, Touch Error Variation as the standard deviation of the distance between the child's finger position when touching the screen and the center of the closest bubble, Touch Average Length as the average length of the child's finger trajectory on the screen, and Touch Average Applied Force as the average estimated force applied on the screen when touching it. See Perochon et al. for methodological details.25
In total, we measured 23 app-derived variables, comprised of 19 CVA-based and 4 touch-based variables. Additional information regarding the computation of these variables, missing data rate, and their pairwise correlation coefficients is detailed in EXAMPLE 2.
Statistical analysis. Using the app variables, we trained a model comprised of K=1000 tree-based extreme gradient-boosting algorithms (XGBoost) to differentiate diagnostic groups.26 For each XGBoost model, 5-fold cross-validation was used while shuffling the data to compute individual intermediary binary predictions and SHapley Additive exPlanations (SHAP) value statistics (metrics mean and standard deviation).28 The final prediction confidence scores, between 0 and 1, were computed averaging the K predictions (see EXAMPLE 2). We implemented a five-fold nested cross validation stratified by diagnosis group to separate the data used for training the algorithm and the evaluation on unseen data.32 Missing data were encoded with a value out of the range of the app variables, such that the optimization of the decision trees considered the missing data as information. Overfitting was controlled using a tree maximum depth of 3, subsampling app variables at a rate of 80%, and using regularization parameters during the optimization process. Diagnostic group imbalance was addressed by weighting training instances by the imbalance ratio. Details regarding the algorithm and hyperparameters are provided in the EXAMPLE 2. The contribution of the app variables to individual predictions was assessed by the SHAP values, computed for each child using all other data to train the model and normalized such that the features' contributions to the individual predictions range from 0 to 1. See details in EXAMPLE 2.
A quality score was computed based on the amount of available app variables weighted by their predictive power (measured as their relative importance to the model; see details in EXAMPLE 2).
The prediction confidence score quantified the confidence in the model's prediction and was used to analyze the varying performance of the app when removing app administrations rated uncertain based on different thresholds (5%, 10%, 15%, and 20%; see details in EXAMPLE 2). Performance was evaluated using the receiver operator characteristic curve (ROC) area under the curve (AUC), with 95% confidence intervals (CI) computed using the Hanley McNeil method.32 Unless otherwise mentioned, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were defined using the operating point of the ROC that optimized the Youden index, with an equal weight given to sensitivity and specificity.13 Given that the study sample autism prevalence
differs from the general population in which the screening tool would be used (πpopuation≈2%), we also report the adjusted PPV and NPV to provide a more accurate estimation of the app performance as a screening tool deployed at scale in practice (see details in EXAMPLE 2). Statistics were calculated in Python V.3.8.10, using SciPy low-level functions V.1.7.3, XGBoost and SHAP official implementations V.1.5.2 and V.0.40.0, respectively.
a DD-LD: developmental delay and/or language delay.
b M-CHAT-R/F: Modified Checklist for Autism in Toddlers, Revised with Follow Up.
c ADOS-2: Autism Diagnostic Observation Schedule - Second Edition.
a PPV and NPV values adjusted for population prevalence (see EXAMPLE 2)
b Gaze silhouette score, gaze speech correlation, and gaze percent social
c Modified Checklist for Autism in Toddlers - Revised with Follow-up final score
aNon-autistic group (neurotypical + DD-LD);
bAutistic;
cNeurotypical (NT);
dAutistic + DD-LD;
eDD-LD. Correct: number of correct diagnosis predictions; Not Correct: number of incorrect predictions.
The disclosure of each of the following references is incorporated herein by reference in its entirety to the extent that it is not inconsistent herewith and to the extent that it supplements, explains, provides a background for, or teaches methods, techniques, and/or systems employed herein. The numbers below correspond to the superscripted numbers in EXAMPLE 1.
The stimuli (brief movies) and game used in the SenseToKnow app are as follows and are illustrated in
Videos were first analyzed using face detection and face recognition algorithms, which ensured the facial information of the target participant was analyzed while the facial information of others in the room was ignored. Human supervision was used to validate the outcome of the facial recognition procedure if the algorithm confidence was below a predefined threshold. On average, less than 10 frames per video required manual input. We extracted 49 facial landmark points consisting of 2D-positional coordinates that were time-synchronized with the movies. Using the facial landmarks, for each frame we computed the child's head pose angles relative to the tablet's frontal camera such as θyaw (left-right), θpitch (up-down), and θroll (tilting left-right) as described in Hashemi et. al. The participant's gaze information (projected onto the device screen) was extracted using an automatic gaze estimation algorithm based on a pre-trained deep neural network. This information was leveraged to extract relevant CVA-based behavioral phenotypes. See the following for more information: Baltrusaitis et al. Openface 2.0: Facial behavior analysis toolkit. IEEE International Conference on Automatic Face & Gesture Recognition 2018: 59-66.; King. Dlib-ml: A machine learning toolkit. J Machine Learning Research 2009; 10: 1755-8.; De la Torre et al. IntraFace. IEEE Int Conf Automatic Face Gesture Recognition Workshops 2015; 1.; Hashemi et al. Computer Vision Analysis for Quantification of Autism Risk Behaviors. IEEE Transactions on Affective Computing 2021; 12(1): 215-26; Krafka et al. Eye tracking for everyone. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016: 2176-84.
Response to name. While children were watching the movies, their name was called three times by an examiner standing behind them. The analysis of the video audio allowed us to automatically detect the instant in which the participant's name was called, while the dynamics of the head orientation (pitch, yaw, and roll) were used to automatically detect if and when the child responded (oriented) toward the person calling their name. Computed vision and audio algorithms were originally proposed in Campbell and colleagues and improved by Perochon and colleagues. Both studies found that autistic toddlers oriented to their name less frequently, and those who oriented had a longer latency/delay in doing so. In the present work, we followed the implementation details described in Perochon and colleagues. Based on the automatic detection of the name call and the head turn events, we defined two CVA-based variables: the response to name ratio representing the proportion of times the participant oriented to their name call, and the response to name delay, which reflected the average delay (in seconds) between the beginning of the name call and the beginning of the head turn. See Campbell et al. Computer vision analysis captures atypical attention in toddlers with autism. Autism 2019; 23(3): 619-28 and Perochon et al. A scalable computational approach to assessing response to name in toddlers with autism. J Child Psychol Psychiatry 2021; 62(9): 1120-31 for more information.
Social attention. For the movies Spinning Top and Blowing Bubbles, we computed the percent of time the child gazed toward the social stimulus. The average across the two movies was represented by the variable Gaze Percent Social. Chang and colleagues reported that autistic toddlers spent more time gazing at the non-social stimulus compared to the social stimulus. See Chang et al. Computational Methods to Measure Patterns of Gaze in Toddlers With Autism Spectrum Disorder. JAMA Pediatr 2021; 175(8): 827-36.
During the Fun at the Park movie, in which two women take turns in a conversation, we evaluated the correlation of the gaze left-right patterns with the patterns of speech, reflected in the variable Gaze Speech Correlation. Chang and colleagues reported that the autistic toddlers' gaze patterns were less coordinated with the flow of conversational speech. See Chang et al., 2021.
For both the social attention (Gaze Percent Social) and attention to speech variables (Gaze Speech Correlation), gaze information was first estimated from the facial and eyes appearance using a pretrained deep neural network model. After this initial estimation, an attention proxy was considered by combining the following automatically computed rules: the child's eyes are open, the estimated gaze is within the screen region of interest, the face head pose is oriented toward the screen, and excluding periods in which the head is moving quickly (e.g., during head turns). All these steps were fully automatic and based on CVA; implementation details are provided by Chang and colleagues. Restricting the gaze information to the intervals in which the children attended to the movies (measured by our CVA-based proxy), we computed three behavioral variables. See Krafka et al. Eye tracking for everyone. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016: 2176-84 and Chang et al., 2021.
Gaze silhouette score. During the social attention and attention to speech movies (Spinning Top, Blowing Bubbles, and Fun at the Park, respectively), it would typically be expected that children would alternate their gaze between the relevant social and non-social stimuli located on the right and left sides of the screen. As described by Chang et al., 2021, a clustering methodology can be used to automatically and robustly evaluate when participants are looking distinctly toward the right and left portions of the screen where relevant social and non-social stimuli are located (versus looking in a less structured/focused pattern), since the concentration of clusters can be mathematically assessed using the silhouette score. See Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 1987; 20: 53-65. Chang and colleagues found that autistic toddlers showed less coherent (clustered) gaze patterns. Using this approach, we defined the Gaze Silhouette Score as the average silhouette score over the three movies. This measure quantified how concentrated (focused gaze) or spread the gaze distribution was throughout these three movies.
Head movement. Using the facial landmarks associated with the corner of the eyes and the tip of the nose, we assessed the participant's head movement while they attended to the movies. The proxy for attention defined above was used to select the frames of interest, and the variation in the distance between the eyes was used to add invariance with respect to the distance to the screen. We computed the average head movement across each stimulus according to previously established studies. See Dawson et al. Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder. Sci Rep 2018; 8(1): 17008 and Krishnappa Babu et al. Complexity analysis of head movements in autistic toddlers. J Child Psychol Psychiatry 2023; 64(1): 156-66. Dawson and colleagues reported that autistic toddlers showed more frequent head movement while watching the movies, accentuated during movies of high social content.18 We compared the rate of head movement across movies that contain a high level of social content (Spinning Top, Blowing Bubbles, Make Me Laugh, Playing with Blocks, and Fun at the Park) and those that contain primarily toys and animated objects (Floating Bubbles, Dog in Grass, and Mechanical Puppy). The average head movement across these two sets of movies were referred to as Head Movement during social and non-social stimuli.
Facing forward. A child's orientation towards the screen, i.e., ‘facing forward’ during any given frame was defined using their (i) head pose angle, (ii) eye gaze, and (iii) rapidity in head movement. The child's head pose |θyaw|25° was used, acting as a proxy for attentional focus on the screen, which is supported by the central bias theory for gaze estimation. See Li et al. Learning to predict gaze in egocentric video. Proceedings of the IEEE International Conference on Computer Vision 2013: 3216-23 and Mannan et al. Automatic control of saccadic eye movements made in visual inspection of briefly presented 2-D images. Spat Vis 1995; 9(3): 363-86. Then, for each frame we checked if the estimated gaze of the participant was on the tablet's screen and their eyes were open. Finally, we excluded the frames where the head was moving rapidly (this can lead to errors in the CVA). To this end, we first performed smoothing of the head pose signal θyaw, obtaining θyaw′. The head was considered to be moving rapidly if at any point θyaw′ of the current frame was >150% of the previous frame. Finally, the Facing Forward variable was estimated as a percentage of frames ‘facing forward’ out of the number of frames for each movie (ranging between 0 and 100). Details on the algorithm are presented in the supplementary materials of Chang et al., 2021.
Blink rate. Using CVA, we estimated the participant's number of blinks while they were watching each of the presented movies, as described next. OpenFace, a facial analysis toolkit that offered facial action units on a frame-by-frame basis, was used. These action units are based on the standard facial action coding system per Ekman. For the blinking action, we used action unit 45 (AU45) to estimate the child's blinks. A smoothing of the AU45 time-series signal was performed, followed by detecting the number of peaks, which are associated with blink actions. To obtain the blink rate, we normalized the number of blinks with respect to the number of valid frames. The valid frames were defined as frames during which the child was (i) facing forward (as defined above) and (ii) the confidence outcome of the OpenFace was at or above the recommended threshold (i.e., 0.75). For more details, see Krishnappa Babu et al. Blink rate and facial orientation reveal distinctive patterns of attentional engagement in autistic toddlers: a digital phenotyping approach. Scientific Reports 2023, 13(1): 7158.
Touch-based (visual motor skills) variables. We collected the tablet touch and device kinetic information while participants played the bubble popping game. In a recent study, we reported findings related to several touch-based variables. See Perochon et al. A tablet-based game for the assessment of visual motor skills in autistic children. NPJ Digit Med 2023; 6(1): 17. Based on the results of this previous study, we further evaluated four touch-based variables explored in the present study: the Touch Popping Rate (TPR), the Touch Average Length (TAL), the Touch Average Applied Force (TAAF), and the Touch Error Std (TES). The TPR is defined as the number of popped bubbles over the number of total touches to the screen, and it provides a notion of the accuracy and overall performance during the game. The TAL evaluated the average touch length, meaning the average length of the trajectory of the child's finger while it is in contact with the screen. The TAAF measures the average force associated with each individual touch, which can be estimated from the data collected by the tablet sensors. Finally, the TES is defined as the standard deviation of the distance between the child's finger position when hitting the screen and the center of the closest bubble. See also Video 2 showing a child playing the game.
The app variables pairwise correlation coefficients are shown in
We now explain the computation of the prediction confidence score, which is used to compute the model performances and assess the certainty of our approach to predict one of the diagnosis groups. The heterogeneity of the autistic condition implies that some autistic toddlers may show more complex behavioral patterns not primarily captured by the app variables, or only by a subset of them. The same holds for non-autistic participants who may exhibit behavioral patterns typically associated with autism (e.g., nonresponding to their name, or responding verbally instead). From a data science perspective, these challenging cases may be represented in ambiguous regions of the app variables space, as their variables might have a mix of autistic and neurotypical-related values. Therefore, the decision boundaries associated with these regions of the variable space may fluctuate when training the algorithm over different splits of the dataset, which we used to reveal the difficult cases. We counted the proportion of positive and negative predictions of each participant, over the K=1000 experiments. The distribution of these averaged prediction for each participant (which we called prediction confidence score; see
We used standard implementation of XGBoost as provided by the authors. We used all default parameters of the algorithms, except the ones in bold that we changed to account for the relatively small sample size and the class imbalance, and to prevent overfitting. n_estimators=100; max_depth=3 (default is 6, prompt to overfitting in this setting); objective=“binary:logistic”; booster=“gbtree”; tree_method=“exact” instead of “auto” since the sample size is relatively small; colsample_bytree=0.8 instead of 0.5 due to the relatively small sample size; subsample=1; colsumbsample=0.8 instead of 0.5 due to the relatively small sample size; learning_rate=0.15 instead of 0.3; gamma=0.1 instead of 0 to prevent overfitting, as this is a regularization parameter; reg_lambda=0.1; alpha=0.
The Shapley Additive exPlanations values, originated in the cooperative game theory field, is a state-of-the-art method employed to shed lights into “black box” machine learning algorithms. This framework benefits from strong theoretical guarantees to explain the contribution of each input variable to the final prediction, accounting and estimating the contributions of the variable's interactions. See Molnar, Chapter 9.6 for a gentle introduction to the theoretical aspects of SHAP values.
In this work, the SHAP values were computed and stored for each sample of the test sets when performing cross-validation, i.e., training a different model every time with the rest of the data. Therefore, we needed to normalize the SHAP values first to compare them across different splits. The normalized contribution of the app variable was denoted as k (k∈[1,K]), for an individual i (i∈[1,N]), is
We conserved the sign of the SHAP values as it indicates the direction of the contribution, either toward autistic or neurotypical-related behavioral patterns.
In the learning algorithm used, being robust to missing values, an individual i may have a missing value for variable k, which will be used by the algorithm to compute a diagnosis prediction. In this case, the contribution (i.e., a SHAP value) of the missing data to the final prediction, still denoted as ϕki, accounts for the contribution of this variable being missing.
In order to disambiguate the contribution of actual variables from their missingness, we set to 0 the SHAP value associated with variable k for that sample and defined as ϕZ
This process leads to 2NK SHAP values for the study cohort, used to compute:
We now explain how the quality score is computed for each app administration, based on the amount of available information computed using the app data, weighted by the predictive ability (or variables importance) of each of the app variables. This score, between 0 and 1, quantifies the potential for the collected data on the participant to lead to a meaningful prediction of autism.
Computation of the app variables confidence (prediction confidence score). Given the set of app variables (xki)k∈[1,K] for a participant i, we first compute a measure of confidence (or certainty) of each app variable, denoted by (ρki)k∈[1,K]. The intuition behind the computation of these confidence scores follows the weak law of large numbers which states that the average of a sufficiently large number of observations will be close to the expected value of the measure. We describe next the computation of the app variables confidence scores ρ.
Computation of the app variables predictive power. When assessing the quality of the administration, one might want to put more weight in variables that contribute the most to the predictive performances of the model. Therefore, to compute the quality score of an administration, we used the normalized app variables importance (G(Xk))k∈[1,K]to weight the app variables. Note that for computing the predictive power of the app variables, we used only the SHAP values of available variables, setting to 0 the SHAP values of missing variables.
Computation of the app administration quality score. After we compute for each administration i the confidence score (ρki)k∈[1,K]of each app variable (xki)k∈[1,K] and gain an idea of their expected predictive power (EX[G(Xk)])k∈[1,K], the quality score is computed as:
When all variables are missing, (ρki)k∈[1,K]=(0, . . . , 0), the score is equal to 0, and when all the app variables were measured with the maximum amount of information, (ρki)k∈[1,K]=(1, . . . , 1), then the quality score is equal to the sum of normalized variables contributions, which is equal to 1.
The prevalence of autism in the cohort analyzed in this study, as in many studies in the field, differs from the reported prevalence of autism in the broader population. While the 2018 prevalence of autism in the United States is of one over forty-four
the analyzed cohort in this study is composed of 49 autistic participants and 328 non-autistic participants
Some screening tool performance metrics, such as the specificity, sensitivity, or the Area Under the Roc Curve (AUROC), are invariant to such prevalence differences, as their values don't depend on the group ratio (e.g., the sensitivity only depends on the measurement tool performance on the autistic group; the specificity only depends on the measurement tool performance on the non-autistic group). Therefore, providing an unbiased sampling of the population and a large enough sample size, the reported prevalence-invariant metrics should provide a good estimate of what would be the value of those metrics if the tool were implemented in the general population.
However, precision-based performance measures, such as the precision (or Predictive Positive Value; PPV), the Negative Predictive Value (NPV), or the FR scores depend on the autism prevalence in the analyzed cohort. Thus, these measures provide inaccurate estimates of the expected performance when the measurement tool is deployed outside of research settings.
Therefore, we now report the expected performance we would have if the autism prevalence in this study were the one in the population, following the procedure detailed in Siblini et al. Master Your Metrics with Calibration. In: Berthold et al. (eds) Advances in Intelligent Data Analysis XVIII. Cham: Springer International Publishing; 2020. p. 457-69.
For a reference prevalence, πpopulation, and a study prevalence of πstudy, the corrected PPV (or precision), corrected NPV, and Fβ are:
and
Table S1 provides the performances of the XGBoost trained to differentiate autistic from neurotypical participants using all app variables, for all the cut-off thresholds defining the operating points of the associated ROC.
We provide the visualization of the dependence between the app variables values and their contribution to the model. This allows us to understand which ranges of the variables' values correspond to an increase or decrease of the model's prediction towards the autistic or neurotypical groups. The app variables are ordered by their global importance to the model; see
Increasing evidence suggests that early motor impairments are a common feature of autism. Thus, scalable, quantitative methods for measuring motor behavior in young autistic children are needed. This work presents an engaging and scalable assessment of visual-motor abilities based on a bubble-popping game administered on a tablet. Participants are 233 children ranging from 1.5 to 10 years of age (147 neurotypical children and 86 children diagnosed with autism spectrum disorder [autistic], of which 32 are also diagnosed with co-occurring attention-deficit/hyperactivity disorder [autistic+ADHD]). Computer vision analyses are used to extract several game-based touch features, which are compared across autistic, autistic+ADHD, and neurotypical participants. Results show that younger (1.5-3 years) autistic children pop the bubbles at a lower rate, and their ability to touch the bubble's center is less accurate compared to neurotypical children. When they pop a bubble, their finger lingers for a longer period, and they show more variability in their performance. In older children (3-10-years), consistent with previous research, the presence of co-occurring ADHD is associated with greater motor impairment, reflected in lower accuracy and more variable performance. Several motor features are correlated with standardized assessments of fine motor and cognitive abilities, as evaluated by an independent clinical assessment.
These results highlight the potential of touch-based games as an efficient and scalable approach for assessing children's visual motor skills, which can be part of a broader screening tool for identifying early signs associated with autism.
Early detection of autism provides an opportunity for early intervention, which can improve developmental trajectories and strengthen social, language, cognitive, and motor competencies during a period of heightened brain plasticity1-4. The current standard of care for autism screening most often relies on a caregiver questionnaire, such as the Modified Checklist for Autism in Toddlers-Revised (MCHAT-R/F), which is used for neurodevelopmental screening in children between 16-30 months of age5,6.
Although useful, the MCHAT-R/F has lower accuracy when administered in real-world settings, such as primary care7,8. Furthermore, the MCHAT-R/F's performance is influenced by the family's socioeconomic status, maternal education level, and the child's sex, race, and ethnicity7-10. Thus, new objective screening and assessment tools based on direct assessment of the child's behavior are needed that can complement screening approaches based on caregiver questionnaires.
While autism is fundamentally characterized by qualitative differences in social and communication domains, impairments in motor abilities have also been documented in autistic children11-15. The prevalence estimates of motor impairments in autism range from 50-85%,14,16-19 these estimates could potentially represent lower bounds since they are limited by the sensitivity of current assessment methods15. Motor impairments often are one of the earliest reported signs associated with autism20-22, and have been documented in autistic children without cognitive impairment19. Thus, early assessment of motor skills could be a useful component of an early screening battery for autism. Several aspects of motor skills have been studied in autism, including gait and balance stability, coordination, movement accuracy, reaction time, manual dexterity, tone, hyperkinesis, and praxis15,21. Various methods have been used to assess such skills using non-gamified paradigms, such as quantifying horizontal arm swings23, variations in reaching to grasp24 or touch25, handwriting26, and gait27.
Research suggests that differences in motor skills associated with autism emerge during infancy. LeBarton and Landa examined motor skills in 6-month-old infants with and without an older sibling with autism. Motor skills at 6 months predicted both an autism diagnosis and level of expressive language acquisition by 30-36 months28. These findings are consistent with other studies that have reported that the early development of motor skills is associated with expressive language outcomes among autistic children29,30. A recent study of patterns of health care utilization in infants who were later diagnosed with autism found a higher rate of physical therapy visits below age 1, underscoring the early manifestation of motor impairments in autism31.
Studies that have sought to characterize the nature of motor impairments in autism have found that autistic children are particularly challenged by tasks that require efficient visual-motor integration32. Visual-motor integration ability affects many domains of functioning, including imitation, which is fundamental for developing social skills. There is some evidence supporting a bias toward proprioceptive feedback over visual feedback in autism33,34. The tablet-based bubble-popping game developed for this study requires the temporal coordination of a dynamic visual stimulus with a motor response involving touch. As such, it is well suited to assess this aspect of early motor development.
The development of miniaturized inertial sensors, wearable sensors, and the ubiquity of mobile devices such as tablets and smartphones have allowed unprecedented access to massive multimodal data acquisition that has been used to characterize motor behavior. These data have been used to derive predictors of Parkinson's severity35, identify and quantify an autism motor signature and characterize the nature of motor impairments in autism36-45. These studies demonstrate the usefulness of tablet-based assessments and games for assessing motor skills.
In the present EXAMPLE, we sought to extend current research findings in three ways. First, we sought to evaluate a tablet-based, gamified visual motor assessment in toddlers at the age when autism screening is typically conducted. Second, intellectual abilities have been found to be correlated with motor impairment in autistic children;24,46 thus, we accounted for the contribution of co-occurring cognitive impairment to motor ability in our analyses.
Third, as ADHD has also been associated with motor impairment, we sought to examine the combined contribution of autism and ADHD to the level and nature of motor impairment47. Previous studies have found that the prevalence of motor impairment among autistic individuals increases when there is co-occurring cognitive impairment and/or psychiatric conditions, including ADHD. One study found that the proportion of autistic children with motor impairment increased by 4.4% if the child had cooccurring ADHD. This study found that the nature of motor impairment in autism versus ADHD may differ, however48. Research suggests that, while autism has been associated with impairment in visual-proprioceptive integration, motor difficulties in ADHD tend to be associated with variability in the accuracy and speed of movement34.
The bubble popping game examined in this study is one part of a mobile application (app) developed by our team that displays developmentally appropriate and strategically designed movies while recording the child's behavioral responses to the stimuli49. The app is administered on smartphones and tablets and does not require spoken language or literacy. Direct observation offers a unique opportunity for capturing and objectively quantifying various aspects of child behavior. We have previously reported results from children's behavioral responses to the movies, which have been found to differentiate autistic from neurotypical toddlers50-56. In the current work, we focused on the bubble popping game, which utilizes inertial and touch features. Based on previous studies, we predicted that autistic children would have a distinct performance on the bubble-popping game, and this pattern would differ between autistic children with versus without co-occurring ADHD. Additionally, we examined whether the motor digital phenotypes derived from the game correlated with standardized measures of cognitive, language, and motor abilities, as well as level of autism-related behaviors, to better understand the relationship between children's motor behavior and their clinical profiles.
In summary, our goals were to: (i) assess motor behavior in children as young as 18 months using a tablet-based game to distinguish autism and neurotypical development at the age at which autism screening is typically conducted, (ii) control for the effects of cognitive ability in our analyses, (iii) evaluate the impact of co-occurring ADHD on motor function in young autistic children, and (iv) evaluate several novel visual-motor features derived from a simple, scalable game and their relationships with children's clinical profiles.
Correlations between motor performance and age. We first examined whether age of the participants was correlated with performance on the game. Combining samples from studies 1 and 2, results indicate that there was a strong correlation between the participant's age and their game performance. Age has a significant positive association with the number of touches (rho=0.62, p<1e-25, N=233) and the bubble popping rate (rho=0.50, p<1e-17, N=233); and a significant negative association with the median distance to the center (rho=−0.48, p<1e-16, N=233), the average touch duration (rho=−0.70, p<1e-36, N=233) and the average touch length (rho=−0.63, p<1e-28, N=233). Given these associations, age was added as a covariate for all group comparisons and correlations in both studies.
Study 1: Comparisons of younger autistic versus neurotypical children Autistic and neurotypical participants in study 1 did not statistically differ in terms of their previous experience playing tablet-based games (Z=0.96, p=0.33; proportion Z-test). The level of engagement/compliance was not a significant factor, indicated by the high completion rate, higher than 95% for both groups. The age distribution comparison between the age-matched neurotypical group (N=128) and autistic group was statistically non-significant (p=0.07, r=0.23; two-sided Mann-Whitney-U test). The two groups did not differ in terms of the mean number of touches, indicating similar levels of overall engagement with the game. However, the two groups were found to statistically differ in terms of several other motor variables.
Study 2: Comparisons of older autistic versus neurotypical children. We first compared the autistic group (including those with cooccurring ADHD) and the neurotypical group in terms of their game performance. The two groups were found to differ in level of cognitive ability (p=2e-5, r=0.64; two-sided Mann-Whitney-U test) but not age (p=0.15, r=0.21; two-sided Mann-Whitney-U test); thus, we included both age and IQ, as reflected in their General Conceptual Ability (GCA) score, as covariates in these analyses. The level of engagement, as reflected in the mean number of touches, did not differ between autistic and neurotypical children (F (1,78)=0.428, p=0.77, η2=0.01; one-way ANCOVA). However, autistic children showed a significantly lower average touch frequency (F (1,57)=14.77, p=1.1e-2, η2=0.21), and a lower median time spent targeting a bubble (F (1,57)=10.79, p=2.0e-2, η2=0.16).
Study 2: Comparisons of older autistic children with and without ADHD Children with and without ADHD did not differ in terms of age (p=0.052, r=0.28), previous experience playing video games (Z=−1.08, p=0.28; proportion Z-test), or their cognitive ability (IQ) based on their GCA on the DAS (p=0.68, r=0.06; two-sided Mann-Whitney-U test).
Combining features for group discrimination. For study 1, we hypothesized that combining multiple features would improve discrimination of autistic and neurotypical toddlers. To this end, we trained logistic regression models to infer from the touch-based features the participant's clinical diagnosis and performed leave-one-out cross-validation to assess the generalization performances of these models. We compared the performance of individual features and a combination of them to assess their complementariness.
For study 2, we also hypothesized that combining the motor related features would improve group discrimination. The same previously described feature selection procedure was used. The ROC curve in
Study 1. Correlations between motor performance and clinical characteristics
Spearman's rho correlation was used to assess the relationship between motor features and clinical variables, with statistical significance computed using a Student's t-distribution. We first examined the partial correlations between motor performance and the clinical characteristics based on clinician-administered measures, controlling for age, for the autistic children in study 1, including their performance on the Mullen Scales of Early Learning (MSEL) and the Autism Diagnostic Observation Schedule (ADOS total calibrated severity score). Partial correlations are illustrated in
Study 2. Correlations between motor performance and clinical characteristics Spearman's rho correlation was again used to assess the relationship between motor features and clinical variables, with statistical significance computed using a Student's t-distribution. We examined the partial correlations between motor performance and the clinical characteristics, controlling for age, for the autistic children in study 2, including their performance on the ADOS total calibrated severity score, ADHD rating-scale total score, and the DAS. These analyses included children with and without cooccurring ADHD. Partial correlations are shown in
Given increasing evidence of the role of motor impairments in autism, objective and accurate evaluation of fine motor skills is an important component of a comprehensive behavioral assessment of autism. We found that an easy-to-administer and engaging bubble popping game can collect meaningful, quantitative, and objective measures of early motor skills in children ranging from 18 months to 10 years of age. Data were feasibly collected in both clinical research settings and pediatric primary care clinics with minimal instructions, using a tablet and without special equipment and training. Therefore, this simple yet informative tool has the potential of being deployed at scale to enhance detection and assessment of early autism signs and obtain objective and quantitative measures of toddler and school age children's visual motor skills. Our results suggest that toddlers as young as 18 months old and children up to 10 years old showed a significant level of engagement with the game. Importantly, autistic and neurotypical children were equally likely to complete the game and touched the screen with similar frequency. In addition to a simple and engaging game that children of a wide age range can readily use, we engineered a set of touch and sensory-based features from the information recorded by the device. Features to evaluate the participants' performance (e.g., number of touches, popping accuracy), their fine motor skills (e.g., popping accuracy, touch duration, applied force), and their preference for repetitive behaviors (e.g., repeat percentage, screen exploration) were measured.
We observed in both groups that several motor variables, including number of touches, bubble popping rate, median distance to the center, average touch duration, and average touch length, were correlated with age, suggesting that these features are promising as means to assess children's developmental trajectories in visual motor skills. Even after controlling for age by matching groups on this variable and using age as a covariate, several differences in visual motor skills between autistic and neurotypical children emerged. In the younger toddler sample, autistic children popped the bubbles at a lower rate despite an equal number of touches, and their ability to touch the center of the bubble was less accurate. When they popped a bubble, their finger lingered for a longer period, consistent with previous findings57, and they showed more variability in their performance. In the older sample, compared to neurotypical children, the autistic children spent a longer period of time on a targeted bubble rather than moving quickly from one bubble to another.
Consistent with previous research47, the presence of co-occurring ADHD was associated with lower visual motor skills. We found that autistic children with ADHD had lower accuracy (average distance from the center), lower number of pops despite an equal number of touches, higher number of touches per target, and overall, more variability in their motor behavior. These results are consistent with previous research showing that ADHD is associated with reduced visual motor accuracy and greater variability34. Finally, we proposed several game-based features and demonstrated that they can be aggregated in simple machine learning algorithms, trained to combine behavioral measurements to discover patterns that distinguish diagnostic groups, offering a potential to use such algorithms based on motor performance to differentiate toddlers and children with neurotypical development, autism, and those with or without co-occurring ADHD.
We also examined whether the motor features derived from the game showed meaningful correlations with independent clinical assessments of the autistic children. In autistic toddlers, several motor features were found to be correlated with the fine motor T-score on the Mullen scales, including pop rate and accuracy, double-touching, touch velocity and duration, and variability in touch popping accuracy (rho=−0.52). Overall IQ was found to be correlated with the number of pops and popping accuracy.
Previous studies of infants who are later diagnosed with autism have found that early motor skills are associated with language acquisition28-30. We found that the number of different bubbles targeted during the game and the proportion of the screen explored by touch were positively associated with the expressive language T-score of the Mullen Scales. Interestingly, repetitive behavior during the game, reflected in the repeated popping of the same bubble, was positively associated with the Mullen visual reception T-score. It is possible that children with stronger visual perception skills were more likely to notice that the same bubble would appear after they popped it rather than quickly exploring other bubbles. Thus, the bubble-popping game might be able to identify visual perceptual strengths in autistic children. Finally, no associations between the motor features and level of autism related behaviors on the ADOS were found in the toddler group.
In the older group, children with higher overall IQ, as well as those with higher spatial skills and nonverbal reasoning skills, tended to show stronger visual motor skills, as reflected in a greater number of bubbles popped as well as other features. Spatial skills measured on the Differential Abilities Scales, in particular, were consistently correlated with strong visual motor skills, as reflected in a higher number of bubbles popped, average touch duration and velocity, lower variation in the force applied, and average time spent targeting a bubble. Unlike in the younger sample of children, fewer correlations between motor features and language ability were found. Higher verbal skills were correlated only with the number of touches. Gaming patterns hold promise for assessing children's motor skills and potentially detecting early differences in motor behaviors associated with autism and ADHD. In the present study, we examined the distributions of the touch-based features and observed that many of the motor features differentiated autistic and neurotypical toddlers and autistic children with and without co-occurring ADHD. When comparing neurotypical and autistic participants, we observed that on average, neurotypical children exhibited greater visual motor control and accuracy. Both groups showed a similar level of engagement with the game (touching the screen a similar number of times). Still, neurotypical participants played the game with quicker and more accurate touches. Autistic children with co-occurring ADHD touched more of the screen and were less accurate and more variable in their motor responses. These findings underscore the role of cooccurring ADHD in accounting for variability in motor skills in autistic children.
Limitations of this work include the relatively limited number of participants to perform analysis per-demographic and per-sex groups. The relatively small sample size in autistic participants also limits the evaluation of the generalization ability of machine learning algorithms. Studies 1 and 2 had different clinical measures, limiting the possibility of comparing their relationship with motor variables on a broader sample. Longer games beyond 20 seconds might provide information about learning, focus, and anticipation. For study 1 of younger children, although it is possible that a child in the neurotypical group had an autism diagnosis, developmental or language delay, or both, it was not feasible to administer diagnostic and cognitive testing to all children. Children in the neurotypical group did not have a positive score on the M-CHAT-R/F and their parents and providers did not express a developmental concern.
This work and the informative data presented here are important steps towards characterizing the heterogeneity of motor functions in autism. Further work is needed to understand, differentiate, and disentangle motor differences associated with co-occurring psychiatric conditions. Additionally, leveraging ecological tools for the longitudinal quantification of motor function could be beneficial for the development of evidence-based interventions targeting visual motor impairments.
The tools proposed here are designed in the context of a broader effort to develop objective, digital behavioral phenotyping tools. Because children's developmental trajectories are variable, it will be of interest to use digital phenotyping to longitudinally track a wider range of behaviors that can be captured with computer vision analysis, including gaze patterns/social attention52, facial expressions/dynamics51,55, postural control58, and fine motor control. The present study is a step in that direction. Future work includes evaluating the features proposed here in combination with others, advancing toward a multi-modal solution that objectively describes the rich and diverse realm of developmental variation precisely and quantitatively.
Participants. Study 1 was comprised of 151 children between 18 and 36 months of age, 23 of whom were subsequently diagnosed with autism spectrum disorder (ASD) based on DSM-5 criteria (see below). Children were recruited and assessed during their well-child visit at one of four Duke pediatric primary care clinics. Inclusion criteria were age of 16-38 months, not ill, and caregiver language was English or Spanish. Exclusion criteria were sensory or motor impairment that precluded sitting or viewing the app, parent not interested or did not have time to participate, child was too upset following doctor appointment, caregiver popped bubbles, or insufficient clinical information. From a larger group of neurotypical participants recruited for the study, neurotypical participants were selected randomly within the age range that matched the autistic group to limit any potential effects of age on analyses of group differences.
Study 2 was comprised of an independent sample of 82 children between 36 and 120 months of age. Based on a diagnostic evaluation (see below), of the 82 children, 63 had a DSM-5 diagnosis of ASD, of which 32 had co-occurring ADHD, and 19 were neurotypical (NT). Children were recruited from the community through flyers and brochures, emails, social media posts, and the research center's registry. Inclusion criteria were aged 36-120 months, not ill, and caregiver language was English or Spanish. Exclusion criteria included a known genetic (e.g., fragile X) or neurological syndrome or condition with an established link to autism, history of epilepsy or seizure disorder (except for history of simple febrile seizures or if the child is seizure-free for the past year), motor or sensory impairment that would interfere with the valid completion of study measures, and history of neonatal brain damage (e.g., with diagnoses hypoxic or ischemic event).
In both studies, participants were excluded if the child did not understand the game (18 participants; NT=13, Autistic=5, Autistic+ADHD=0; none of the study 2 participants failed to understand the game) or if caregivers popped the bubbles when the child was supposed to pop the bubbles by themselves (5 participants), as reported by the trained research assistant administering the app. Children who did not engage sufficiently in the game, defined as having touched the screen fewer than three times, were also excluded from the analysis (NT=29, Autistic=3, Autistic+ADHD=0).
Table 1 below describes the participants' age, sex, and other demographic characteristics. Caregivers/legal guardians provided written informed consent, and the study was approved by the Duke University Health System Institutional Review Board (Pro00085434, Pro00085435, Pro00085156).
Clinical assessments. In study 1, at the time of app administration, caregivers also completed the Modified Checklist for Toddlers Revised with Follow-up (M-CHAT-R/F)6 during the well-child visit when the game was administered. M-CHAT-R/F is a caregiver-report screening questionnaire that asks about autism-related behaviors.
Children who failed the M-CHAT-R/F and/or children for whom the caregiver or physician expressed a developmental concern were referred for a diagnostic evaluation conducted by a licensed and research-reliable psychologist. The average time between referral for evaluation and completing an evaluation was 3.5 months. The diagnostic evaluation included the Autism Diagnostic Observation Schedule—2 (ADOS-2)59 and the Mullen Scales of Early Learning (MSEL)60, the latter of which yielded an Early Learning Composite Score (ELC) and the following subscale scores: (a) fine motor, (b) visual reception, (c) receptive language, and (d) expressive language. Children in study 1 were not evaluated for co-occurring ADHD because such a diagnosis is not considered reliable before age 3 years. Children were considered neurotypical if they did not fail the M-CHAT-R/F, and neither the caregiver nor their provider expressed a developmental concern. Neurotypical children did not receive a diagnostic or cognitive evaluation.
In study 2, an autism spectrum disorder diagnosis was established by a research-reliable clinical psychologist based on the ADOS-2 and the Autism Diagnostic Interview—Revised (ADIR)61. Cognitive ability was assessed via the Differential Abilities Scale (DAS)62. Co-occurring DSM-5 ADHD diagnosis was established by a licensed clinical psychologist with expertise in ADHD (Davis) via the Mini-International Neuropsychiatric Interview for Children and Adolescents (MINI-Kid) with supplementary questions for assessing ADHD in children63, brief clinical child interview when appropriate, review of the parent-completed ADHD-Rating Scale (ADHD-RS)64, reviews of teacher-completed ADHD-RS when available, and clinical consensus based on clinical observations and these instruments. The ADHD-RS yielded an overall ADHD-RS score and Hyperactivity and Impulsivity subscale scores. For Study 2, neurotypical children were defined as having an IQ>70, Vineland Adaptive Behavior Scale scores in the average range65, and no clinical elevations on a set of parent-completed rating scales, including the Child Behavior Checklist66, ADHD-RS, and the Social Responsiveness Scale67. Clinical data were collected using REDCap software.
Pop the bubbles game. The bubble-popping game was delivered at the clinic directly following the well-child visit with the pediatrician. During the app, two types of stimuli are presented. First, a set of brief movies (in total, <10 min) with social and non-social content were displayed using the device's screen. While the child watched the movies, the device's frontal facing camera was used to capture their facial expressions, gaze, and postural/facial dynamics. Next, the bubble popping game was presented. Caregivers were asked to hold their child on their lap and the child was positioned such that they could independently and comfortably touch the Pad's screen and play the game. The iPad was placed on a tripod, around 50 cm from the participant, allowing a sufficient dynamical response of the tripod when the touchscreen is touched while preserving the stability of the device. To minimize distractions during the app administration, other family members and the research staff were asked to stay behind both the caregiver and the child. First, the caregiver was encouraged to pop a few bubbles as a demonstration. Once the child had popped two bubbles independently, the training session ended, and the analyzed data began to be recorded for 20 seconds. By design, a bubble popped when the starting location of a touch was within 18.5 mm of its center. Furthermore, when the child popped a bubble, an identical bubble (i.e., same color) began to ascend from the bottom of the screen and came to the same location. This component of the game allowed an assessment of repetitive versus exploratory behavior (popping a different bubble than last popped). During the data collection, caregivers were instructed not to touch the screen nor provide any further instructions to the child. We used 7th and 8th generation Pads, both 10.2″ inches. With a sampling rate of 60 Hz, on-device high precision inertial and gyroscopic sensors recorded the acceleration and orientation of the device, and screen-based features such as bubbles popping and screen touches. Inertial data were used to compute a proxy for the pressure applied on the screen. At the end of the game, caregivers were asked how frequently their child used tablets or smartphones; among those who responded (244/274, 89.1%), 94.3% of caregivers reported their child had previous experience watching or playing games on a tablet or smartphone (43% frequently, 33% occasionally, and 24% rarely).
Feature extraction. Using the touch data collected and the tablet kinetic information provided by the device sensors, we computed a set of features representing the participants' motor behavior. More precisely we defined: (1) number of touches, representing the total number of unique times the participant touched the screen, see
Statistical analysis. Differences in previous experience with electronic games were assessed using a proportion Z-test. Group differences in age and IQ were assessed using a two-sided Mann-Whitney-U test. Effect size, denoted as ‘r’, was evaluated with the rank-biserial correlation algorithm68. Spearman's rho correlation was used to assess the relationship between motor features and clinical variables, with statistical significance computed using a Student's t-distribution69. Group comparisons were made using one-way ANCOVA for motor-related variables, with the diagnostic group as the categorical predictor (autistic/NT and autistic/ADHD+autistic). We used age as a covariate for study 1 sample, and age and IQ as covariates for study 2. Eta-squared, denoted as η2, was calculated to quantify effect sizes.
Benjamini-Hochberg correction was applied to p-values to control for False Discovery Rate (FDR)68. Significance was set at the 0.05 level. Logistic regression was used to assess performance for individual motor features and their combination. We started by using the features that most strongly differentiated the two groups, then selected the feature leading to the best AUC performances. This commonly used type of greedy approach helped address the statistical challenges of high dimensional data. Leave-one-out cross-validation was used to evaluate the generalization performance of models, as recommended in the case of relatively small sample size70. Scikit-learn71 implementations LogisticRegression and GridSearchCV were used to define models and find optimal parameters for each set of motor features. Span of evaluated hyperparameters include: “C” in [0.01, 100], “penalty” in [I1, I2, none], “dual” in [True, False], “fit_intercept” in [True, False], and “solver” in [liblinear, lbfgs].
During the training process, we addressed class imbalance by up-sampling the minority group. Models used for prediction were evaluated using receiver operator curve characteristic (ROC) area under the curve (AUC) with 95% confidence intervals computed by the Hanley McNeil method72. Statistics were calculated in Python using SciPy low-level functions V.1.4.1, Statsmodels V.0.10.1, and Pingouin V.0.3.473-75. Spearman's rho correlation was used to assess the relationship between motor features and clinical variables, with statistical significance computed using a Student's t-distribution.
The disclosure of each of the following references is incorporated herein by reference in its entirety to the extent that it is not inconsistent herewith and to the extent that it supplements, explains, provides a background for, or teaches methods, techniques, and/or systems employed herein. The numbers below correspond to the superscripted numbers in EXAMPLE 3.
indicates data missing or illegible when filed
The AUG values were relatively consistent across groups; however, confidence intervals were larger due to the smaller sample sizes. Leave-one out cross-validation approach was used. Features used to fit the model was the average length, the average touch duration, and the average time spent for the Study 1 sample, and the average distance to the center, the number of targets, and the screen exploratory percentage for the study 2 sample,
Atypical facial expression is one of the early symptoms of autism spectrum disorder (ASD) characterized by reduced regularity and lack of coordination of facial movements. Automatic quantification of these behaviors can offer novel biomarkers for screening, diagnosis, and treatment monitoring of ASD. in this work, 40 toddlers with ASD and 396 typically developing toddlers were shown developmentally-appropriate and engaging movies presented on a smart tablet during a well-child pediatric visit, The movies included social and non-social dynamic scenes designed to evoke certain behavioral and affective responses. The front-facing camera of the tablet was used to capture the toddlers' face. Facial landmarks' dynamics were then automatically computed using computer vision algorithms, Subsequently, the complexity of the landmarks' dynamics was estimated for the eyebrows and mouth regions using multiscale entropy. Compared to typically developing toddlers, toddlers with ASD showed higher complexity (i.e., less predictability) in these landmarks' dynamics. This complexity in facial dynamics contained novel information not captured by traditional facial affect analyses. These results suggest that computer vision analysis of facial landmark movements is a promising approach for detecting and quantifying early behavioral symptoms associated with ASD.
Facial expressions are often used as a mode of communication to initiate social interaction with others1,2, and are one of the key social behaviors used by infants during early development3. Individuals with autism spectrum disorder (ASD) often experience challenges in establishing social communication coupled with difficulties in recognizing facial expressions and using them to communicate with others4. Reduced sharing of affect and differences in use of facial expressions for communication are core symptoms of ASD and are assessed as part of standard diagnostic evaluations, such as the Autism Diagnostic Observation Schedule (ADOS)5. Children with ASD more often display neutral affect and ambiguous expressions compared to children with other developmental delays and typically developing (TD) children6.
Standardized observational assessments of ASD symptoms require highly trained and experienced clinicians7. Research on facial expressions usually involves manually coding of observations of facial expressions from recorded videos, based on complex and time-intensive facial action coding systems. These methods are difficult to deploy at scale and universally. Therefore, researchers have been employing technological advancements to capture facial expressions using motion capture and computer vision (CV)8,9. The application of CV can help to quantify the intensity of emotional expression and the atypicality of facial expressions7. Prior work in CV shows that quantification of the differential ability in producing facial expressions can distinguish children with typical development versus ASD10. CV can also help in understanding the lag in developmental stages of facial expression production, offering cues to understand the emotional competence faced by the individuals with ASD11.
Exploiting CV, it was shown that the facial expressions of children with ASD were often ambiguous9, a result in agreement with a study using a non-CV approach6. A recent study12 extracted the dynamics of facial landmarks to estimate the group differences across various emotional expressions. Individuals with ASD exhibited a higher range of facial landmarks' dynamics compared to TD individuals across all emotions assessed. To quantify the complexity of facial landmarks' dynamics, researchers have started to explore computational tools such as autoregressive models13 and entropy measures14. These studies found that individuals with ASD exhibit distinctive complexity in their facial dynamics, compared to TD individuals when they were asked to mimic given emotions. One of the standard measures used to analyze the complexity of physiological signals (e.g., facial dynamics) is the multiscale entropy OVLSE)14-18, discussed and extended in this work.
The present Example focuses on analyzing the complexity of spontaneous facial dynamics of toddlers with and without ASD. Toddlers watched developmentally—appropriate and. engaging movies presented on a smart tablet. Simultaneously, the frontal camera of the tablet was used to record the toddlers' faces, providing the opportunity for the automatic analysis via CV. Specifically, we studied the facial landmarks' dynamics of the toddlers with ASD versus TD), quantified in terms of a complexity estimate derived from MSE analysis.
We hypothesized that the complexity in landmarks' dynamics would differentiate toddlers with and without ASD, offering a distinctive biomarker. We hypothesized that the toddlers with ASD would exhibit higher complexity (i.e., less predictability) in their landmarks' dynamics associated with regions such as the eyebrows—representing their uniqueness in raising eyebrows19, and mouth-potentially related to atypical vocalization patterns20-22. Furthermore, we were interested in exploring whether our findings would support previous work showing atypical eyebrow19 and mouth22 movements in the ASD population. Lastly, we also examined whether the complexity in landmarks' dynamics provides complementary and nonredundant information to the estimated affective expressions that the toddlers manifested in response to the presented movies, or if they provide redundant information. In one of our previous studies19, we examined affect (i.e., emotional expressions) variation over a period of time while the toddlers were engaged and watched the presented movies. Though the work19 presented the feasibility of distinguishing between the ASD and TD groups based on patterns of affective expression, in our current work, we investigate the possibility of using the complexity of the raw facial landmarks dynamics without considering any variation in affect. This is motivated in part by the fact that individuals with ASD are prone to elicit a mixture of emotions at the same time [6], therefore using the complexity of the raw landmarks' dynamics can offer further confidence and add additional information beyond affect. Thus, for a much larger dataset and re-designed stimuli, we replicate the affect-related analysis similar to that of Carpenter et al., “Digital behavioral phenotyping detects atypical pattern of facial expression in toddlers with autism,” Autism Res., vol. 14, no. 3, pp. 488-499, 2020, and show that the complexity in facial dynamics are an independent and more powerful measure.
In this EXAMPLE, we demonstrate the following: (a) the MSE, as here extended to handle time-series with partially missing data and to compare across subjects, can characterize complexity in facial landmarks' dynamics; (11) the complexity in landmarks' dynamics can distinguish between ASD and TD groups; and (c) this complexity information is complementary to information about affective expressions estimated from computer vision-based algorithms.
Participants and Study Procedures. Toddlers between 17-36 months of age were recruited at four pediatric primary care clinics during their well-child visit. Toddlers received a commonly-used, caregiver-completed autism screening questionnaire, Modified Checklist for Autism in Toddler—Revised with Follow-up (M-CHAT-R % F)23 as part of routine clinical care. If a child screened positive on the M-CHAT-R/F, or a caregiver; clinician expressed any developmental concern, the child was evaluated by a child psychologist based on the Autism Diagnostic Observation Schedule—Toddler (ADOS-T) module24 The exclusion criteria were: (1) known hearing or vision impairments; (ii) child too upset; (iii) caregiver expressed no interest, not enough time, needed to take care of siblings, or unable to give consent in English or Spanish; (iv) child did not complete study procedures; and (v) clinical information missing. A total of 436 children (ID er: 396 and ASD eri 40) participated; 82.8% of caregivers had a college degree, 155% had more than high school education, and 1.8% did not have a high school education; for additional demographics please see Table 1 above. Caregivers/legal guardians provided written informed consent, and the study was approved by the Duke University Health System Institutional Review Board (Pro00085434, Pro00085435).
Caregivers were asked to hold their child on their lap and an iPad was placed at a distance of about 60 cm from the child. The tablet's frontal camera recorded the child's behavior while short movies (each less than a minute and a half long) were presented on the tablet's screen. The movies consisted of both social and non-social components; see
Movies. The movies were presented through an application (app) on a tablet. These movies were strategically designed to elicit autism—relevant behaviors. For our current analysis, we categorized the movies according to whether they contained primarily social versus non-social elements (
Blowing Bubbles (˜44 secs): A man held a bubble wand and blew the bubbles, with some attempts successful and some failing, with eye contact, smiling and frowning. The movie included limited talking from the actor.
Spinning Top (˜53 secs): An actress played with a spinning top with both successful and unsuccessful attempts along with eye contact, smiling and frowning. The movie included limited talking from the actress.
Rhymes and Toys (˜49 secs): An actress recited nursery rhymes, such as Itsy-Bitsy Spider, while smiling and gesturing, followed by a series of dynamic, noise-making toys which were shown without the presence of the actress on the scene.
Make Me Laugh (˜56 sees): An actress engaged in silly, funny actions while smiling and making eye contact.
Mechanical Puppy (˜25 secs): A mechanical toy puppy barked and walked towards vegetable toys.
Dog in Grass Right-Right-Left (RRL) (˜40 secs): A barking puppy was shown in different parts of the screen, followed by a series of appearances in a right-right-left OW pattern.
Facial Landmark Detection and Preprocessing. A face detection algorithm was used first to identify the number of faces detected in each of the recorded video frames25. Using a low-dimensional facial embedding26,27, we ensured that we tracked only the participant's face throughout the video, ignoring other detected faces associated with the caregiver, siblings (if any), and clinical practitioners. Once the child's facial image was detected and tracked, we extracted 49 facial landmark points28. These 2D positional coordinates of 49 facial landmarks were time synchronized with the presented movies.
The facial landmarks were then preprocessed in two steps for our further analysis, namely (1) first compensating for the effects due to global head motions via global shape alignment, and (2) removing the time segments when the participants were not attending to the stimuli. For step 1, we utilized the points from the corners of the eyes and the nose (
For the step 2 of preprocessing, we were interested in studying the dynamics of the facial landmarks as a spontaneous response to the presented movies, we focus our analysis on time segments in which the participants were considered to be engaged with the presented movies. To this end, we filtered the segments considering two criteria: (1) extreme non-frontal head pose, and (2) rapid head movement (this in part can render computation of landmarks very unstable). A non-frontal head pose was defined as the frames where the head pose angles lied outside the ranges ±20° for the upitch and uyaw, and _45_ for the uroll. Frames containing rapid head movement were removed by analyzing the angular speed of the head pose. We calculated a one second moving average (θ′) of the time-series data for θroll, θpitch, and θyaw. Using these smooth versions of the head pose coordinates, we excluded frames where the difference (θdiff) between the current frame i and the previous frame i−1 was more than 5°, estimated using:
Computation of Facial Landmarks' Dynamics. After the above described preprocessing, and considering only the valid/attending frames, we extracted the landmarks' dynamics, concentrating on the eyebrows and the mouth regions (
Multiscale Entropy for Measuring the Complexity of Landmarks' Dynamics. Multiscale entropy (MSE) is used as a measure of dynamic complexity, quantifying the randomness or unpredictability of a time-series physiological signal operating at multiple temporal scales15,16, including facial dynamics14,17. Briefly, entropy helps in quantifying the unpredictability or randomness in a sequence of numbers; the higher the entropy, the higher its unpredictability. The sample entropy is a modified version widely used to assess the complexity of time-series33,34. These concepts are formalized next.
The MSE is estimated by calculating the sample entropy on a time-series data
with length N at multiple timescales t. To this end, the time-series X is represented at multiple resolutions ({yj(τ)}) by coarse-graining X as
Here, we downsampled the landmarks' dynamics timeseries data across the frames for 0 to 30 scales. During this downsampling process, a downsampled data point yi was filled with the average values of xi−s, only when a minimum of 50% of the xi−s were not missing (see
As mentioned before, the sample entropy is a measure of the irregularity of a signal; at a given embedding dimension m and a positive scalar tolerance r, the sample entropy is given by the negative logarithm of the conditional probability that if the sets of simultaneous data points having length m repeats within the distance r, then the sets of simultaneous data points having length m+1 also repeats within the distance r. Consider a time-series (landmarks' dynamics) of length N as X ¼ fx1; . . . ; xi; . . . ; xNg, from which the m-dimensional vector Xm i ¼{xmi, xm i1, xmi
2, . . . , xm i
m_1} is formed. The distance d
xmi; xmj
between any two vectors is defined as
Consider Cm(r) to be the cumulative sum of the number of repeating vectors in the m-dimension (see _r, with i 6¼ j, and analogously Cm+(r) be the cumulative sum of repeating vectors in m
1-dimension. Then, the sample entropy (SampEn) is defined as
When estimating Cm and Cm+, the choice of the dimension m, the tolerance value r34, and the missing data in the landmarks' dynamics34,35Z35] play a vital role. This is discussed next.
For our study, we set m=2 because it was the most commonly used value in previous similar studies, e.g., references [14], [15], [16], [17]. However, the choice of different m values did not greatly affect our findings (see Appendix I, which can be found on the Computer Society Digital Library at http://doi. ieeecomputersociety.org/10.1109/TAFFC.2021.3113876). We have selected r ¼0.15 s, where s is considered to be a standard deviation of the time-series. The value 0.15 was chosen as suggested in other studies, e.g., references [16], [35], [36]. It was also evident that a time-series having noisy spikes can increase the tolerance r because it is a function of s, which can be vulnerable to noise34. As a result, even if the time-series data was complex (irregular), as r increases, we allow a higher degree of tolerance to match repeating vector sequences in both m an m1 dimensions, resulting in a lower SampEn value. This can be potentially mitigated by removing the noisy spikes; we have already taken care of this for our landmarks' dynamics via the previously described preprocessing steps. Another challenge arises when we compare the SampEn computed on the landmarks' dynamics that came from two different participants having two different values of s influencing the tolerance r. There is a persisting misconception in the literature in using s calculated on a time-series at the per-participant level, and comparing the resulting SampEn across different participants, causing severe bias in the final outcome. To overcome this spurious effect and compare signals consistently across participants, for the definition of r, we used the population standard deviation rather than the standard deviation associated with each participant.1 Finally, it could be possible that individuals with ASD tend to exhibit large amounts of head movements, causing a large amount of missing data in the time-series, challenging the SampEn estimation. To handle any such missing data and consistently estimate the SampEn, we selected the segment in m
1 dimension only if the respective vector was embedded with no missing data (see
Below this threshold the estimation of the SampEn may not be reliable. For our analysis, the 40% depended on the length of the movies. Participants having less than 40% data were removed from the analysis for a specific movie.
Estimation of Affective States. In addition to studying the complexity of facial dynamics, we also considered the more standard approach of investigating affective expressions, and explicitly show that these two measurements are not redundant. For consistency, we used the pose-invariant affect method from our previous work37, while other approaches could be used for this secondary analysis. We estimated the probability of a participant's three different categories of affective expression using four facial expressions: positive (happy expression), neutral (neutral expression) and negative (angry and sad expression).
Statistical Analysis. Statistically significant difference between the groups' distribution was tested using Mann-Whitney U test, in particular, the python function pingouin.mwu was used. Within group comparisons (e.g., to compare between the eyebrows and mouth regions) were performed using Wilcoxon signed-rank test with python function pingouin.wilcoxon. Effect sizes were estimated with the standard r value for both significant tests.
Since, there could be possible confounds due to covariates such as (1) ‘percentage of missing data’ that was removed during preprocessing, and (2) ‘the variation in the landmark movements’ on our complexity measures; we have performed additional statistics, e.g., analysis of covariance (ANCOVA) using pingouin.ancova. Additionally, for cross-correlation analysis, we computed Spearman's r using scipy.stats.spearmanr in python. A decision tree-based classifier [38] was used to assess the possible separation between the ASD and TD groups using our complexity analysis and affect-related measures while considering each individual movies. ‘Gini impurity’ was used as automatic splitting criteria. The differences between Area Under the Curves (AUCs) of the Receiver Operating Characteristic (ROC) based on the different features and movies were compared using the DeLong method39. Additionally, logistic regression (with a python function, sklearn.linear_model.LogisticRegression) was used to estimate odds ratio to predict the risk for ASDin toddlers. For the logistic regression, we have again used both the proposed landmarks' complexity and the affect-related measures.
The two main questions explored in this section are: (1) whether the estimation of complexity in landmarks' dynamics can be used as a distinctive biomarker to distinguish between ASD and TD participants, and (2) whether the estimated complexity measure adds value beyond traditional measures of facial emotional expressions.
Complexity of Facial Landmarks' Dynamics. To address our first research question, we estimated the MSE for the eyebrows and mouth regions of the participants in both the ASD and TD groups (
Though the integrated entropy was significantly different between the ASD and TD groups, we still wanted to check if possible confounds such as (1) ‘percentage of missing data’ and the (2) ‘variation in the landmark dynamics’ had any influence on our findings. To do so, we performed a cross-correlation analysis. The results indicated a weak correlation (ranged 0.2-0.42) between the integrated entropy and ‘percentage of missing data’ as well as ‘variation in the landmark dynamics’. The lower values in the correlation analysis states that the (1) integrated entropy is not affected by the ‘percentage of missing data,’ and (2) integrated entropy measures are not just capturing the ‘variation in the landmarks' dynamics.’ Though we have considered only the participants that had a minimum of 40% of landmarks' data for our analysis after the preprocessing, we still wanted to see if the ‘percentage of missing data’ are statistically different between the groups. The results indicate that the two groups are not significantly different with regards to the ‘percentage of missing data’. Additionally, we have used these possible two confounds as a covariate in ANCOVA to further ensure the statistical difference between the ASD and TD groups. Even after adjusting for the two confounds, either individually or together, the significant difference was maintained between the ASD and TD groups, with p<0.0001 for social tasks and p<0.001 for non-social movies, indicating that the integrated entropy was not influenced by these confounds.
Additionally, to understand whether the two regions of interest (eyebrows and mouth) had different levels of complexity among themselves, we compared the integrated entropy values between these two regions within the ASD and TD groups. The results indicated that the integrated entropy for these two regions were not significantly different from each other (p>0.05 with effect size (r)<0.1), irrespective of the movies, for both the ASD and TD groups. Additionally, in order to understand individualized range of change in the integrated entropy across the tasks, we have estimated the range defined as,
Landmarks' Complexity and Affective State. This section addresses our second research question. A comparative analysis on the calculated energy upon the first-derivative of the time-series data related to neutral, positive, and negative affect indicated that the participants expressed a higher probability of neutral affect in response to all the movies (Table 2). This finding is consistent with previous work from our group19. Our movies were not designed to elicit any negative affect, as expected, we did not observe any significant negative facial emotions. We now consider the energy calculated from the first derivative of positive affect's time-series data (PositiveEnergy0) for further analysis. To understand if the observed complexity in facial landmarks' dynamics was simply an outcome of expressing positive affect in response to the movies, we extracted the correlation coefficient between the integrated entropy and PositiveEnergy0, for all the movies (combining both the ASD and TD groups). The results (Table 3) indicated that though some dependency existed for certain movies, such as Blowing Bubbles (BB) and Mechanical Puppy (MPuppy), it was not the case for the other movies such as Spinning Top (ST), Rhymes and Toys (RAT), Make Me Laugh (MML), Dog in Grass RRL (RRL). The results were similar even when the analysis was done separately for ASD and TD groups. Thus, the landmarks' dynamics were possibly a combination of affect and other manifestations such as atypical mouth movements20-22, and frequently raised eyebrows and open mouth potentially reflecting level of attentional engagement19. Furthermore, the correlation coefficient (r) was comparatively more pronounced between the integrated entropy of the mouth region and the PositiveEnergy0, where the positive affect (smile) can be more prominent than the eyebrows. It was evident from these results that although a partial correlation existed with the affect data, the complexity of the landmarks' dynamics can offer a complementary measure with additional unique information.
Classification Approach Using Landmarks' Complexity. To understand the feasibility of using the proposed complexity measure to automatically classify the individuals with ASD and TD, we used the integrated entropy values of the eyebrows and mouth regions as an input to a decision tree-based model. Since the statistical tests showed that the significant difference in integrated entropy between the groups had larger effect sizes during social movies, we have considered only those movies for this analysis.
Furthermore, the odds ratio calculated from the linear logistic regression (Table 5) indicates that the higher values of integrated entropy increase the chance to predict the risk of ASD by up to 1.8 times while considering either of the social movies. On the other hand, the changes in the affective expression (i.e., PositiveEnergy0) did not offered any positive odds ratio (that is, 0.8, which is less than 1). However, both the integrated entropy and PositiveEnergy0 had a significant contribution (p<0.05) in fitting the logistic regression model. Again, using only the integrated entropy has offered same results than when combined with PositiveEnergy0, indicating that the integrated entropy was independent from the affect measure and powerful enough to distinguish ASD and TD groups.
We designed an iPad-based application (app) that displayed strategically designed, developmentally appropriate short movies involving social and non-social components. Children diagnosed with ASD and children with typical development (TD) took part in our study, watching the movies during their well-child visit to pediatric clinics. The device's front-facing camera was used to record the children's behavior and capture ASD-related features. Our current work was focused on exploring biomarkers related to facial dynamics. We exploited the children's facial dynamics from the eyebrows and mouth regions using multiscale entropy (MSE) analysis to study the complexity of such facial landmarks' dynamics. The complexity analysis may give insights about the level of irregularity (or ambiguity) in the facial dynamics that can potentially be used as a biomarker to distinguish between children with ASD and those with typical development (TD). Basically, the complexity estimates using entropy offer information about how easy is to predict facial landmark dynamics rather than just their variation. Specifically, a time-series with higher variations that are highly periodic will result in extremely low values of the entropy measure. Similar is the case when the time-series data is almost stable with very low variations, the entropy value will still be low (which was the case for our TD participants). In contrast, for a time-series with higher variability, irregular and non-periodic movements, the entropy value will be high, indicating higher complexity in facial dynamics (which was the case for the ASI) participants). We speculate that the presence of greater predictability with minor facial movements in the TD toddlers reflects a higher level and more consistent understanding of the shared social meaning of content of the movies (e.g., rhymes, a conversation). If so, the TD toddlers might be expected to respond more predictably to the stimuli, whereas the responses of the children with autism may be more idiosyncratic. It has been previously documented (e.g., reference [9]) that children with autism make more atypical and ambiguous facial expressions and that these vary across children with autism.
As expected, the results of our modified approach to MSE analysis captured distinctive landmarks' dynamics in children with ASD, characterized by a significantly higher level of complexity in both the eyebrows and mouth regions when compared to typically-developing children. This measure can be robust and complementary to other measures such as affective state19. The observation from the integrated entropy supports recent work indicating that individuals with ASD often exhibit a higher probability of neutral expression19, Neutral expressions might be interpreted by others as more ambiguous in terms of the affective state they convey. The results here reported are also in agreement with other works related to atypical speech and mouth movements2,22, offering a scope and directions for further exploration. Also, it was shown that the individuals with ASD have difficulties in affect coordination during interpersonal social interaction40, it would be also interesting to study the potential of complexity/coordination in facial dynamics in such context in future. Finally, we observed that the proposed integrated entropy (the sum of SarnpEn from 1-20 scales of the MSE) might not only hold promise in distinguishing children with ASD versus TD, but also from other developmental disorders (e.g., developmental delay, /language delay. Additionally, the integrated entropy has offered better performance in classifying the ASD and TD groups of participants while using machine learning based classifiers, e.g., a decision tree-based model, offering avenue to build an automated decision—making pipeline for conveying a probability of risk in. children with ASD while using the app in home-based settings. Notwithstanding the fact that in addition to the complexity measure it would be necessary to combine other features such as gaze-related indices, name call response, signatures of motor deficits, and more, as mentioned in [7] before deploying such automated decision-making tool. Complementary to the facial landmarks' dynamics, the known deficits in motor control in children with ASD can be manifested in the form of poor postural control WI which can be captured easily with our data. Exploring complexity estimates in these head motions can be an interesting future work.
Limitations of this study include: landmarks and affect detection were based on algorithms trained primarily on adults, although our previous work showed it was still rah, able for toddlers; other measures of complexity might be more robust than the MSE in their ability to discriminate children with and without ASD; and the study sample, while relatively large, still has a limited number of ASD participants and did not have sufficient power to determine the impact of demographic characteristics on the results.
To conclude, in this work, we introduced a newly normalized measure of MSE more suited for across-subject comparison and demonstrated that the complexity of facial dynamics has potential as an ASI) biomarker beyond more traditional. measures of affective expression. Our findings were consistent with the previous work on patterns of affective expression in children with ASD, while adding new discoveries underscoring the value of dynamic facial primitives. Considering that autism is a very heterogeneous condition, the combination of the novel biomarkers described here with additional biomarkers has the potential to improve the development of scalable screening, diagnosis, and treatment monitoring tools.
[5]C. Lord et al., “The autism diagnostic observation schedule generic: A standard measure of social and communication deficits associated with the spectrum of autism,” J. Autism Develop. Disord., vol. 30, no. 3, pp. 205-223, 2000.
[6]N. Yirmiya, C. Kasari, M. Sigman, and P. Mundy, “Facial expressions of affect in autistic, mentally retarded and normal children,” J. Child Psychol. Psychiatry, vol. 30, no. 5, pp. 725-735, 1989.
Complexity Analysis of Head Movements in Autistic Toddlers Early differences in sensorimotor functioning have been documented in young autistic children and infants who are later diagnosed with autism. Previous research has demonstrated that autistic toddlers exhibit more frequent head movement when viewing dynamic audiovisual stimuli, compared to neurotypical toddlers. To further explore this behavioral characteristic, in this study, computer vision (CV) analysis was used to measure several aspects of head movement dynamics of autistic and neurotypical toddlers while they watched a set of brief movies with social and nonsocial content presented on a tablet. Methods: Data. were collected from 457 toddlers, 17 to 36 months old, during their well-child visit to four pediatric primary care clinics. Forty-one toddlers were subsequently diagnosed with autism. An application (app) displayed several brief movies on a tablet, and the toddlers watched these movies while sitting on their caregiver's lap. The front-facing camera in the tablet recorded the toddlers' behavioral responses. CV was used to measure the participants' head movement rate, movement acceleration, and complexity using multiscale entropy. Results: Autistic toddlers exhibited significantly higher rate, acceleration, and complexity in their head movements while watching the movies compared to neurotypical toddlers, regardless of the type of movie content (social vs. nonsocial). The combined features of head movement acceleration and complexity reliably distinguished the autistic and neurotypical toddlers. Conclusions: Autistic toddlers exhibit differences in their head movement dynamics when viewing audiovisual stimuli. Higher complexity of their head movements suggests that their movements were less predictable and less stable compared to neurotypical toddlers. CV offers a scalable means of detecting subtle differences in head movement dynamics, which may be helpful in identifying early behaviors associated with autism and providing insight into the nature of sensorimotor differences associated with autism.
Autism is characterized by differences in social communication and the presence of restrictive and repetitive behaviors (American Psychiatric Association, 2014). In addition to the presence of motor stereotypies, other motor differences often associated with autism include impairments in fine and gross motor skills, motor planning, and motor coordination (Bhat, 2021; Flanagan et al., 2012; Fournier et al., 2010; Melo et al., 2020). Studies based on home videos of infants who were later diagnosed with autism reported asymmetry in body movements (Baranek, 1999; Esposito & Venuti, 2009; Teitelbaum et al., 1098). Detailed motor assessments have documented difficulties in gait and balance stability, postural control, movement accuracy, manual dexterity, and praxis among autistic individuals (Chang et al., 20.10′; Minshew et al., 2004; Molloy et al., 2003; Wilson, Enticott, & Rinehart, 2018; Wilson, McCracken, et al., VMS).
Recent research utilizing computer vision analysis to measure differences in movement patterns has documented differences in patterns of head movement dynamics while watching dynamic audiovisual stimuli among autistic children. Martin and colleagues (Martin et al., 2018) examined differences in head movement displacement and velocity in 2.5- to 6.5-year-old children with and without a diagnosis of autism while they watched a video of social arid nonsocial stimuli. Head movement differences between the autistic and neurotypical children were found in the lateral (yaw and roll) but not vertical (pitch) movement and were specific to periods when children were watching social videos. These authors suggested that the autistic children may use head movements to modulate their perception of social scenes. Zhao et al. quantified three-dimensional head movements in 6- to 13-year-old children with and without autism while they were engaged in a conversation with an adult (Zhao et al., 2021). They found that the autistic children showed differences in their head movement dynamics not explained by whether they were fixating on the adult. Dawson et al., 2018 found that toddlers who were later diagnosed with autism exhibited a significantly higher rate of head movement while watching brief movies as compared to neurotypical toddlers.
In the current study, we extended our earlier work on head movements in autistic toddlers (Dawson et al., 2018) in two ways: First, we sought to replicate our earlier findings related to the head movement rate with a significantly larger sample using similar but re-designed novel movies with social versus nonsocial content. Second, we expanded our analysis by not only computing head movement rate, but also the acceleration and the complexity of the time-series associated with the head movements. The acceleration provided an estimate of changes in head movement rate (i.e., velocity), while the complexity estimate reflected the level of predictability and stability of head movements (Costa et al., 2002). We used multiscale entropy (MSE) to quantitatively assess the complexity or predictability of head movement dynamics (Costa et al., 2002). This metric quantified the regularity of the one-dimensional time-series on multiple scales (Costa et al., 2002; Zhao et al., 2021).
We hypothesized that, compared to neurotypical toddlers, autistic toddlers would exhibit higher head movement rate, acceleration, and increased complexity (less predictability). We also examined whether differences in head movement measures were more pronounced when autistic children watched movies with high levels of social content. Finally, we used machine learning classification analyses to determine whether these measures can be integrated to distinguish autistic and neurotypical toddlers.
Participants. Participants were 457 toddlers, 17-36 months of age, who were recruited from four pediatric primary care clinics during a well-child checkup. Forty-one toddlers were subsequently diagnosed with autism spectrum disorder (ASD) based on DSM-5 criteria. Inclusion criteria were: (a) age 16-38 months; (b) not ill at the time of visit; and (c) caregiver's primary, language at home was English or Spanish. Exclusion criteria were: (a) known hearing or vision impairments; (b) the child was too upset during the visit; (c) the caregiver expressed no interest, not enough time; (d) the child was not able to complete the study procedures (e.g., if child would not stay in their caregiver's lap, or the app or device failed to upload data), or the clinical information was missing; and/or (e) presence of a significant sensory or motor impairment that precluded the child from seeing the movies and/or sitting uptight. Table 1 shows the participants' demographic characteristics.
Ethical considerations. Caregivers provided written informed consent, and the study protocols were approved by the Duke University Health System Institutional Review Board (Pro00085434, Pro00085435).
Clinical measures Modified checklist for toddlers revised with follow-up (M-CHAT-R/F) As a part of routine clinical care, all participants were assessed with a commonly used autism screening questionnaire, the M-CHAT-R/F (Robins et al., 2014). The M-CHAT-R/F consists of 20 questions answered by the caregiver to evaluate the presence/absences of autism-related symptoms.
Diagnostic and cognitive assessments. Toddlers with a total M-CHAT-R/F score≥3 initially and those for whom the total score was 2 after the follow-up questions, and/or those for whom the pediatrician and/or parent expressed developmental concerns, were referred for diagnostic evaluation by the study team psychologist. The Autism Diagnostic Observation Schedule—Toddler Module (ADOS-2) was administered by research-reliable licensed psychologists who determined whether the child met DSM-5 criteria for ASD (Luvater et al., 2009). Cognitive and language abilities were assessed using the Mullen Scales for Early Learning (Mullen, 0.1995).
Group definitions. Group definitions were: (1) Autistic (Pia 41), defined as having a positive M-CHAT-R/F score or caregiver/physician raised concerns and subsequently meeting DSM-5 diagnostic criteria for autism spectrum disorder (ASD) with or without developmental delay based on both the ADOS-2 and clinical judgment by licensed psychologist; and (2) Neurotypical (N=416), defined as (a) having a high likelihood of typical development with a negative score on the M-CHAT-R/F and no developmental concerns raised by caregiver/physician or (b) having a positive M-CHAT-R/F score and/or care-giver/physician raised concerns, but then determined based on the ADOS-2 and clinical judgment of the psychologist as not having developmental or autism-related concerns. There was another group of participants (N=12) who had a positive M-CHAT-R/F score and received a diagnosis other than autism (e.g., language delay without autism). Given the small sample size, we have excluded these participants from our current analyses.
Application (app) administration and stimuli. In each of the four clinics, a quiet room with few distractions was identified in which the app could be administered. Although it was quiet and without a lot of distraction, it was not otherwise controlled as in a laboratory setting. The rooms for the four clinics were similar in size, lighting, and presence of distractions (e.g., table in the room).
App administration. We designed an app compatible with iOS devices which displayed developmentally appropriate movies. The front-facing camera was used to record the toddlers' behavioral responses while watching the movies. Caregivers were requested to hold their child on their lap while a tablet was placed on a tripod at about 60 cm in front of the child. No special instructions were given regarding how to hold the child on their lap. The parent sat quietly throughout the app administration and was asked not to guide the child's behavior or give instructions. Other family members (e.g., siblings) and the assistant who administered the app stayed behind both the caregiver and the child to reduce distractions during the experiment. We computed and analyzed the participants' head movements that were recorded while they watched four movies with high social content, and three movies with low. social content (nonsocial), described next.
Social movies containing human actors in the scene. Blowing-Bubbles (˜64 s): An actor with a bubble wand blew the bubbles, along with smiling and frowning and limited verbal expressions (
Nonsocial movies containing dynamic objects. Floating-Bubbles (˜35 s): Bubbles were presented at random and moved throughout the frame (
Capturing facial landmarks and head orientation using computer vision. A computer vision algorithm was used to detect the faces in each frame of the recorded video (King, 2009). Similar to (Chang et al., 2021; Perochon et al., 2021), scarce human supervision was triggered by a face-tracking algorithm to ensure that we tracked only the participant's face. Then, we extracted 49 facial landmarks consisting of 2D-positional coordinates (Baltrusaitis et al., 2018) (
Rate. We computed the average Euclidean displacements of three central landmarks (represented in red colors in
Acceleration. We estimated the child's acceleration from the rate of head movements. Intuitively, acceleration is associated with the child's head movement rate (i.e., their physical velocity), and therefore the first derivative of this quantity can be interpreted as the second derivative of the head positions. This second-order derivate is of particular interest since it relates to the magnitude of the instantaneous forces involved in head movement. We estimated the absolute mean of the acceleration (Mean_accelHM) as the difference between the head movement between two consecutive frames, averaged over a 1/3 s window. We have also estimated total energy of the head movements, which was not as powerful as Mean_accelHM in detecting a statistical difference between the two groups.
Complexity. To estimate the complexity of head movement rate (time-series) at multiple time-resolutions using MSE (Costa et al., 2002, 2005), a time-series X ¼fx1, . . . , xi, . . . , xN g was down-sampled to 30 different scales (T=1 to 30) and represented as
Subsequently, sample entropy (SampEn) was calculated on each of these resolutions of the time-series. SampEn can be defined as an estimate of irregularity in time-series: given an embedding dimension m and a positive scalar tolerance r, the SampEn is the negative logarithm of the conditional probability that if the sets of simultaneous data points of length m repeat within the distance r, then the sets of data points having length m+1 also repeat within the distance r (Richman & Moorman, 2000). If the repeatability is low, the SampEn will be high, and the time-series is considered more complex. Considering the m-dimensional embedding vector Xmi ¼xmi, xmi 1, xmi
2, . . . , xmi
m_1_ _from the time-series X ¼fx1, . . . , xi, . . . , xN g of length N, the distance d between two vectors xmi and xmj was defined as d
Equation (3) defines the sample entropy, where Cmδr and Cm
1δr
denote the cumulative sum of the number of repeating vectors in the m and m+1 embedding spaces, respectively. Two vectors xmi and xmj were defined as repeating if they met the condition d xmi, xmj_ _s≤r, where i≠j. To handle any bias due to missing data while computing the SampEn, we only considered the segments where the data were available in the m+1 dimensional space (Dong et al., 2019). The parameter m as set to 2, similar to (Costa et al., 2002; Harati et al., 2016), and r=0.15*σ, where 0.15 is a scaling factor chosen, similar to (Costa et al., 2002; Dong et al., 2019; Lake, Richman, Pamela Griffin, & Randall Moorman, 2002). Σ denotes the signal's standard deviation that characterizes the time-series. Since a can vary across different participants, we used the population-wise standard deviation; this choice defined a distance threshold r consistent across participants (see Krishnappababu et al., 2021 for a detailed discussion). Finally, a global complexity estimation (across multiple scales) was obtained integrating the SampEn across the first 10 scales (Intergrated-entropyHM). At least 40% of the data were necessary to perform effective complexity analysis (Cirugeda-Roldan et al., 2014; Lake et al., 2002) after handling the missing segments similar to Krishnappababu et al. (2021). Below this threshold the estimation of the SampEn may not be reliable. Participants having <40% data were removed from the analysis for each specific movie.
Statistical analysis. Mann-Whitney U-test was used to estimate the statistical significance between the groups, using python (pingouin.mwu). Within group comparisons (e.g., to compare between the social and nonsocial movies) were performed using Wilcoxon signedrank test using python (pingouin.wilcoxon). The statistical power was calculated using the effect size, ‘r’ provided by pingouin.mwu and pingouin.wilcoxon. A 2×2 mixed ANOVA was used to estimate the main effects due to (a) group and (b) movie type (social and nonsocial) and their interaction effect with python function, pingouin.mixed_anova. For mixed ANOVA analysis, we have estimated the mean values of the Mean_rateHM, Mean_accelHM and Integrated_entropyHM for the social and non-social movies across all the participants. Additionally, analysis of covariance (ANCOVA) using pingouin. ancova was performed to determine the influence of covariates such as participants' age and percentage of missing data. A support vector machine (SVM)-based classifier with radial basis function (RBF) kernel (Cortes & Vapnik, 1995) was used to assess the classification power of the proposed features. Classification performance was compared using the area under the curve (AUC) of the receiver operating characteristic (ROC) with Leave-one-out cross-validation (Elisseeff & Pontil, 2003). 95% confidence intervals were computed with the Hanley and McNeil method (Hanley & McNeil, 1982). We chose SVM since it is popular when used on relatively smaller datasets, and also a cross-validation was done to minimize the risk of overoptimistic classification (Vabalas et al., 2019). Classification performance of the different models were compared based on their true difference (dt), estimated using observed difference in the error (d) and sum of variance of the error across the two models (ad), at a significant threshold of p<0.05: dt=d_ 1.96*σd (Tan et al., 2006). If the dt spans over zero, then the two models are considered not significantly different.
Engagement with the app administration. The percentages of assessments to which the child attended for the majority of the app administration were 95% and 93% for the neurotypical and autistic groups, respectively. Differences in rate, acceleration, and complexity of head movements
Rate.
Acceleration.
Complexity (entropy).
Within-qroup differences. Further, we have analyzed the differences within each of the autistic and neurotypical groups for Mean_rateHM, Mean_accelHM, and Integrated_entropyHM in response to the social and nonsocial movies. The Wilcoxon signed rank test indicated that the neurotypical group exhibited significantly higher Mean_rateHM (p<0.0001, r=0.47), Mean_accelHM (p<0.0001, r=0.62;
Relationship between head movement variables and cognitive ability. Mullen Scale scores were available for the autistic group. There was a positive correlation between the Mullen Early Learning Composite Score and the head movement measures, Mean_rateHM, Mean_accelHM, and Integrated_entropyHM, during the social movies (r's=0.36, 0.39 and 0.34, respectively; p's<0.05) but not during the nonsocial movies (all nonsignificant).
Combining acceleration and complexity via a classification framework. The Mean_accelHM and the Integrated_entropyHM were moderately correlated (r=0.4-0.5). We used these two measures from the four social movies for classification analysis since effect sizes for group differences in head movements were larger during the social movies. We trained an SVM-based classifier using these two input features and group as the classification target to evaluate how these measures can be used to discriminate groups. We evaluated the performance using information collected during a single movie and combination of all four social movies. For the latter analysis, we included the data from participants who had data from these two features from all four movies, resulting in N=31 for the autistic group and N=389 for the neurotypical group. Testing for individual movies, Mean_accelHM and Integrated_entropyHM distinguished the autistic and neurotypical groups (see
We demonstrated that a scalable app delivered to toddlers on an Pad during a well-child visit can be used to detect early head movement differences in toddlers diagnosed with autism. Similar to our previously published findings, autistic toddlers exhibited a higher rate of head movements while watching dynamic audiovisual movies, regardless of whether the content was social or nonsocial in nature. Furthermore, we found that the autistic toddlers also showed greater acceleration and complexity of their head movements compared to neurotypical toddlers. Our findings suggest that this sensorimotor behavior, which is exhibited while watching complex, dynamic stimuli and characterized by more frequent head movements that have higher acceleration and more complexity, is an early feature of autism. Moreover, in an analysis combining measures of head movement acceleration and complexity for each movie and across all movies with social content, we demonstrated that an SVM-based classifier based on head movement dynamics differentiated the autistic and neurotypical groups in a data-driven fashion.
The nature of these differences in head movement dynamics is not fully understood. Such differences do not appear to reflect the degree of attention to the movies, as the measures were only taken during the time frames when children were attending to the movies (facing forward and gazing towards the screen). Moreover, including the amount of attention to the movies as a covariate in our analyses did not affect our results. Similarly, the head movements do not appear to reflect degree of social engagement because the autistic children also showed differences in head movement dynamics while watching the nonsocial movies. Martin and colleagues found that autistic children exhibited higher levels of head movements only while viewing social stimuli and interpreted the movements as a mechanism for modulating their perception of the social stimuli (Martin et al., 2018). In contrast, we found that, whereas the neurotypical toddlers showed increased head movements during social as compared to nonsocial movies, the autistic toddlers nevertheless showed high levels of head movements during both types of movies. Thus, our data do not support the hypothesis that the head movements of the autistic toddlers were used to modulate the perception of social stimuli, per se. It is still possible, however, that the movements were more generally used to modulate sensory information across the different types of movies. Interestingly, autistic children with lower cognitive abilities showed higher levels of head movement rate, acceleration, and complexity specifically during viewing of the social movies. It is possible that children with lower cognitive abilities found the social movies more difficult to interpret, as these movies did involve the use of simple speech, facial expressions, and gestures by the actor.
Another possibility is that the head movements reflect differences in postural control. Previous studies of postural sway in autistic individuals have found that postural control difficulties increase when sensory demands are increased, such as when viewing stimuli requiring multisensory integration (Cham et al., 2021; Minshew et al., 2004). Examining the videos of toddlers with high levels of head movements in the present study revealed that the movement involves not just the head but also the upper body including trunk and shoulders, which were not captured by our computer vision algorithm, as we focused solely on the face in this study. Maintaining stability of posture and midline head control relies on complex sensorimotor processes which are challenged when viewing complex multisensory information. Difficulties in multisensory integration have been documented in autistic individuals (Donohue et al., 2012). Like some forms of repetitive behavior, head movements might also serve a regulatory function, especially if children found the stimuli arousing, similar to findings in studies of postural control (Cham et al., 2021).
Future research is needed to further explore the developmental course of differences in head movement dynamics in autism and elucidate their nature and neurobiological basis. Limitations of this study include the sample size, which was relatively large but included a smaller number of autistic children and did not offer sufficient power to determine the influence of biological and demographic characteristics, such as sex. Data from all participants were not used in some of the analyses because we used data from participants who attended for at least 40% of the movie length. The autistic and neurotypical group differed in cognitive ability and, thus, it is not clear the degree to which differences in cognition contributed to our findings. In summary, results of this study confirm that a difference in head movement dynamics is one of the early sensorimotor signs associated with autism. Combining this feature with other behavioral biomarkers such as gaze, facial dynamics, and response to name will allow us to develop a multimodal computer vision-based digital phenotyping tool capable of offering a quantitative and objective characterization of early behaviors associated with autism.
Autistic children exhibit more frequent head movements while watching dynamic stimuli compared to neurotypical children.
Earlier research suggested that computer vision can automatically measure these head movement patterns.
This larger study confirmed that computer vision can be used to objectively and automatically measure head movement dynamics in toddlers from videos recorded via a digital app during a well-child checkup in primary care.
Rate, acceleration, and complexity of head movements were found to be significantly higher in autistic toddlers compared to neurotypical toddlers.
Combining head movements with other behavioral biomarkers, a multimodal computer vision and machine learning-based digital autism screening tool can be developed, offering quantitative and objective characterization of early autism-related behaviors.
The disclosure of each of the following references is incorporated herein by reference in its entirety to the extent that it is not inconsistent herewith and to the extent that it supplements, explains, provides a background for, or teaches methods, techniques, and/or systems employed herein. The numbers below correspond to the superscripted numbers in EXAMPLE 5.
Differences in social attention are well-documented in autistic individuals, representing one of the earliest signs of autism. Spontaneous blink rate has been used to index attentional engagement, with lower blink rates reflecting increased engagement. We evaluated novel methods using computer vision analysis (CVA) for automatically quantifying patterns of attentional engagement in young autistic children, based on facial orientation and blink rate, which were captured via mobile devices. Participants were 474 children (17-36 months old), 43 of whom were diagnosed with autism. Movies containing social or nonsocial content were presented via an iPad app, and simultaneously, the device's camera recorded the children's behavior while they watched the movies. CVA was used to extract the duration of time the child oriented towards the screen and their blink rate as indices of attentional engagement. Overall, autistic children spent less time facing the screen and had a higher mean blink rate compared to neurotypical children. Neurotypical children faced the screen more often and blinked at a lower rate during the social movies compared to the nonsocial movies. In contrast, autistic children faced the screen less often during social movies than during nonsocial movies and showed no differential blink rate to social versus nonsocial movies.
A large body of literature has utilized eye tracking to document differences in gaze patterns to social versus nonsocial stimuli in autistic individuals across the lifespan1-3. While the majority of studies of attention in autism have focused on gaze patterns, spontaneous eye blink rate has also been used to assess attention4. Studies have demonstrated task-related modulation of blink rate, with rate of blinking inversely related to level of encoding of information in working memory and attentional engagement5-7. The evolutionary basis of varying blink rate stems from the idea that real-time assessments of the salience and value of information unconsciously change blink rate to increase or decrease the amount of visual information that is processed8. Evidence suggests a connection between spontaneous blink rate and striatal dopamine activity, with decreased blink rate found in persons with Parkinson's disease, attention-deficit/hyperactivity disorder (ADHD), and fragile X syndrome9-11. Hornung et al.12 found that, compared to neurotypical children, blink rate and theta spectral EEG power, another measure of attentional engagement, were both reduced in autistic children. Another study using eye tracking found that neurotypical children exhibited lower blinking when watching scenes with high affective content, whereas autistic children blinked less frequently when looking at physical objects13. These results are consistent with findings that autism is associated with reduced social attention’, which is evident as early as 2-6 months of age14,15.
Traditionally, eye tracking has been used to measure gaze and blink rate patterns. We explored whether it was possible to detect meaningful patterns of attention via blink rate in toddlers using computer vision analysis (CVA) based on data collected via an application (app) on a smart tablet without the use of additional equipment. In a previous study, we demonstrated that it was possible to reliably measure atypical patterns of gaze, characterized by reduced attention to social stimuli, via CVA in young autistic toddlers compared to their neurotypical peers16.
The current analysis extends previous work by studying blink rate as an additional method for capturing patterns of attentional engagement in toddlers while they watched a series of strategically-designed social and nonsocial movies on a smart tablet. Along with blink rate, we also estimated the duration of the child orienting towards the tablet's screen, denoted as total time facing forward (TFF). We predicted that neurotypical toddlers would reduce their blinking and thus exhibit lower blink rate when viewing movies with high social content, as compared to those without social content. In contrast, we predicted that autistic toddlers would either fail to exhibit a differential blink rate to movies with social versus nonsocial content or show lower blink rates when viewing movies with nonsocial content, suggesting higher attentional engagement when viewing nonsocial stimuli.
Effects of group and stimulus type on facing forward and blink rate variables. To estimate the main effects of group and stimulus type (social versus nonsocial movies) and their interaction effects for total time facing forward (TFF) and blink rate, a 2×2 mixed ANOVA was conducted. This analysis was based on the movies that had primarily social or nonsocial content (refer to the “Methods and materials” section along with
A main effect of group was found for mean TFF (F (1, 440)=40.76, P<0.0001, ηp 2=0.086) and mean blink rate (F (1, 440)=17.63, P<0.0001, ηp 2=0.04). On average, autistic children had lower mean TFF and higher mean blink rate compared to neurotypical children. A main effect of stimulus type was also found for TFF (F (1, 440)=98.17, P<0.0001, ηp 2=0.18) and blink rate (F (1, 440)=54.30, P<0.0001, ηp 2=0.12), indicating that, on average, participants exhibited higher TFF and lower blink rate during the social movies compared to nonsocial ones.
Interaction effects between group and stimulus type were found for both mean TFF (F (1, 440)=28.27, P<0.0001, ηp 2=0.06) and mean blink rate (F (1, 440)=7.78, P=0.005, ηp 2=0.02). Comparisons of the mean TFF and blink rate values within the neurotypical and autistic groups during social versus nonsocial movies are shown in
To ensure that the overall group difference in TFF was not driving results, we repeated these analyses using only the participants having TFF>0.80 and found that the pattern of results remained consistent along with statistical significance (see supplementary materials FIGS. S1 and S2 for more details and statistics). The number of participants with TFF>0.80 for the mean TFF and mean blink rate of the social and nonsocial movies are: autistic group (N=20) and neurotypical group (N=394). The numbers of participants for each individual movie are presented in Figure S2 of the supplementary material. Furthermore, to test whether the participant's age had any effect on the measures, ANCOVA was conducted using ‘age’ as covariate. The pattern of results remained consistent after including the covariate.
It is possible that the autistic children were facing forward less during the social movies because, on average, the social movies were longer and tended to come toward the end of the app administration, as compared to the nonsocial movies. To address this, group differences in TFF were also examined separately for each individual movie (
In addition to the estimation of the blink rate (see “Methods and materials”), in the supplementary material we present the (i) valid number of frames (Table S1) and (ii) raw blinks quantity without normalizing with respect to the valid number of frames (Table S2) for both the groups. The blink rate is a normalized representation of the ratio of raw blink quantity and valid number of frames for each participant during a movie since we wanted to have an estimate of blinking only when the participants are ‘facing forward’ towards the movie. However, to ensure that the valid number of frames are not inflating the blink rate, we present a similar statistical analysis for the valid number of frames and the raw blinks quantity (see Tables S1 and S2). The statistically significant differences between the two groups remained the same for the raw blinks quantity. Furthermore, we observed only a moderate correlation (Pearson correlation coefficient, r=−0.45) between the mean TFF and mean blink rate. This level of correlation indicates that the TFF and blink rate are two different measures that are complementing each other to quantify the participant's engagement towards the movies.
Distinguishing groups based on three CVA-based attention measures. We next examined how well the attention measures, mean TFF and mean blink rate, along with mean gaze percent social (MGPS; social attention variable) distinguished the two groups using a classification tool. MGPS was based on the percentage of time the child gazed at the social elements during “Blowing Bubbles” and “Spinning Top” which displayed both social and nonsocial elements separately either on the right or left side of the screen (see “Methods and materials” for details about the movies, and
We have included the MGPS for classification analysis because we excluded the movies “Blowing Bubbles” and “Spinning Top” in the estimation of mean TFF and mean blink rate. Since MGPS gives an estimate of the child's percentage of look duration towards the social part (left/right) of the screen, we explored its importance in complementing the mean TFF and mean blink rate for classification.
We considered mean values during social movies (mean TFFsocial and mean blink ratesocial) for this analysis. These two measures were moderately correlated (negative) with each other (r=−0.45), when analyzed using the Pearson correlation coefficient. The mean TFFsocial (r=0.13) was positively correlated and mean blink ratesocial (r=−0.13) was negatively correlated with MGPS. We trained the logistic regression-based classifier using these three attention features and the participant diagnostic group as the classification target to assess how these measures can potentially be used to identify behaviors linked to autism (
Relationship between attention variables and clinical characteristics. For the autistic group, we examined the relationship between the mean TFF and blink rate during the social and nonsocial movies and several clinical variables, including Mullen Early Learning Composite Score and Visual Reception Score, and Autism Diagnostic Observation Schedule (ADOS) Calibrated Severity Scores (ADOS CSS total, restricted/repetitive behavior, social affect). As shown in Table 1, total time facing forward during the social movies was negatively correlated with ADOS total and social affect scores. Autistic children with higher total and social affect ADOS CSS spent less time facing forward during the social movies. Mean total time facing forward (TFF) during the nonsocial, but not the social, movies was negatively correlated with cognitive abilities (Mullen Early Learning Composite Score and Visual Reception Score). Children with higher cognitive abilities spent less time facing forward during the nonsocial movies. We did not find any relationships between the mean blink rate and the clinical variables (Table 1).
Research has consistently documented differences in attentional patterns in autistic individuals, characterized by reduced visual social engagement1. Such differences are apparent during infancy and offer a means of detecting early signs of autism14,15,17.
Thus, developing scalable, objective, and quantitative methods for measuring patterns of attentional engagement in infants and toddlers is an important goal. We have previously shown that CVA can be used to detect distinct patterns of gaze in autistic toddlers, characterized by reduced social attentional engagement, using relatively low-cost, scalable devices without any special set-up, equipment, or calibration16.
In the present study, we extend this work by demonstrating that using the same app shown on a tablet, we can use CVA to capture distinctive patterns of attentional engagement to social and nonsocial stimuli in autistic toddlers, based on facial orientation and blink rate. This offers an additional quantitative, objective approach to assessing early attention in toddlers. Overall, autistic toddlers spent less time with their face oriented forward to the movies and exhibited higher blink rates compared to neurotypical toddlers. Our finding of reduced attentional engagement, regardless of stimulus type, is consistent with past work18, performed with consumer-grade eye-tracking tools, indicating that reduced visual engagement in autistic toddlers is not limited to social stimuli, but also extends to nonsocial stimuli. This finding is also consistent with eye tracking studies that reported that autistic toddlers exhibit lower overall sustained attention to any dynamic stimuli19.
A recent review of studies using functional brain imaging to assess social and nonsocial reward processing in autistic individuals suggested that autism is associated with general differences in reward anticipation that are not specific to social stimuli20. Considering previous findings linking blink rate to reward circuitry mediated by dopaminergic activity11,12, it is possible that differences in blink rate in autistic children found in the present study are associated with alterations in brain circuitry related to reward anticipation while watching the movies.
Clinical measures. Mullen scales of early learning ADOS calibrated severity score Early learning composite score Visual reception Restricted repetitive behavior Social affect Total Mean total facing forward
In addition to overall differences in attentional engagement, autistic and neurotypical toddlers displayed distinctive patterns of attentional engagement when viewing social compared to the nonsocial movies. These results align with previous findings indicating that toddlers later diagnosed with autism tend to exhibit reduced attention to social scenes in free-viewing eye tracking tasks14, evident as early as 6 months of age21.
Neurotypical children faced the screen more often and blinked at a lower rate during social than nonsocial movies, with large effect sizes, suggesting that the social stimuli had higher salience. In contrast, autistic children faced the screen less often during social than nonsocial movies and did not exhibit a differential blink rate to social versus nonsocial movies. This is consistent with a previous study of blink rate which found reduced blink rate in neurotypical children during viewing of social stimuli, possibly due to their increased engagement with the stimuli13.
Group comparisons showed that, on average, the neurotypical children faced toward the screen more often during the social movies than autistic children, whereas the two groups did not differ in their tendency to face toward the screen during the nonsocial movies. The combination of three different measures of attentional engagement (facing the screen and blink rate during social movies and percent time gazing at social stimuli) distinguished between autistic and neurotypical children with an AUC=0.82.
Limitations of this study include the sample size, which despite being relatively large, did not offer sufficient power to determine the influence of sex and other demographic characteristics, such as race and ethnicity. Future studies are planned to assess the generalizability of these findings to diverse populations. Such studies are particularly important in light of previous findings linking differences in gaze patterns to face stimuli of same—versus different-race22,23. Moreover, future studies will be needed to examine the specificity of the findings to autism by directly comparing blink rate and facial orientation during viewing of social and nonsocial stimuli in autistic children to that of children with other neurodevelopmental disorders, such as ADHD and language or developmental delay.
By combining these novel indices of attention with other digital phenotypic features, such as facial dynamics24,25, orienting26, and head movements27,28, in the future, it may be possible to develop a scalable robust phenotyping tool to detect autism in toddlers, as well as monitor longitudinal development and response to early intervention.
Participants. Participants were 474 toddler age children recruited during their well-child checkup at four pediatric primary care clinics. Based on DSM-5 criteria, 43 toddlers were subsequently diagnosed with autism spectrum disorder. Further, 15 toddlers were diagnosed with language delay/developmental delay, and the remaining 416 participants were neurotypical (NT). Inclusion criteria were: (i) age 16-38 months and (ii) caregiver's primary language was English or Spanish. Exclusion criteria were: (i) hearing or vision impairments; (ii) the child was too upset or ill during the visit; (iii) the caregiver expressed they had no interest or did not have enough time; (iv) the child would not stay in their caregiver's lap, or the app or device failed to upload data, or the clinical information was missing; and (v) presence of a significant sensory or motor impairment that precluded the child from watching the movies and/or sitting upright.
Ethical considerations. The study protocols were reviewed and approved by the Duke University Health System Institutional Review Board (Pro00085434, Pro00085435). All the methods used in this study were performed in accordance with all relevant guidelines and regulations. Informed consent was obtained from all participants' parents or their legal guardians. Informed consent was obtained from actors shown in
Clinical measures. Modified checklist for autism in toddlers: revised with follow-up (M-CHAT-R/F). A commonly used screening questionnaire, M-CHAT-R/F29 was administered to all the participants. The caregiver completed M-CHAT-R/F (20 questions) was used to evaluate the presence/absence of autism-related symptoms.
Diagnostic and cognitive assessments. Participants whose M-CHAT-R/F score was ≥3 initially or had a total score 2 after the follow-up questions, or whose pediatrician or caregiver expressed developmental concerns, were referred for diagnostic evaluation. The Autism Diagnostic Observation Schedule-Toddler Module (ADOS-2) was administered by a research-reliable licensed psychologist from the study team who determined whether the child met DSM-5 criteria for autism30.
The Mullen Scales of Early Learning31 was used to assess the participant's cognitive and language abilities.
Autistic (N=43). This group included toddlers with an M-CHAT-R/F positive score and/or with developmental concerns raised by the pediatrician/caregiver who subsequently met DSM-5 diagnostic criteria for autism spectrum disorder with or without developmental delay based on both the ADOS-2, Mullen Scales, and clinical judgment by a research reliable psychologist.
Neurotypical (N=416). This group included toddlers having a high likelihood of typical development with an M-CHAT-R/F score ≤1 and no developmental concerns raised by the pediatrician/caregiver, or those who had a positive M-CHAT-R/F score and/or the pediatrician/caregiver raised concerns but then were determined to not have developmental or autism-related concerns by the psychologist based on the ADOS-2, cognitive testing via Mullen Scales, and clinical judgment. Table 2 shows the participants' demographic characteristics for the autistic and neurotypical groups, consisting of 459 participants. There was another group of participants (N=15) who had a positive M-CHAT-R/F score and received a diagnosis of language delay/developmental delay (LD-DD) without autism. Children included in the LD-DD group were those who had failed the M-CHAT-R/F or had provider or caregiver developmental concerns, were referred for evaluation and administered the ADOS-2 and Mullen Scales and were then determined by a licensed psychologist not to meet DSM-5 criteria for autism. All children in the LD-DD group scored ≥9 points below the mean on at least one Mullen Early Learning Subscale (1 SD=10 points). Given the small sample size, we present data for the LD-DD group only in the supplementary materials (refer to Table S1, FIGS. S3 and S4). The demographic characteristics of 474 participants, including the LD-DD participants, are presented in Table S3.
Application (app) administration and stimuli. The app was administered on a tablet (iPad) that displayed developmentally appropriate, short social and nonsocial movies during the child's well-child visit. The tablet was mounted on a tripod placed at ˜60 cm from the child while the caregiver was holding the child on their lap. Any other family members (e.g., siblings) and the research staff who administered the app stayed behind both the caregiver and the child. The tablet's frontal camera recorded the video of the child at 30 fps which was further used for CVA to automatically capture their behavioral responses. The social and nonsocial movies were presented in the same order for all participants, as described next. The total duration of the movies was about 8 min. All movies contained both visual and auditory stimuli, described below. In both the social and nonsocial movies, visual and auditory stimuli were sometimes synchronized (e.g., “Dog in the Grass” and “Rhymes”) and sometimes non-synchronized (e.g., “Floating Bubbles and “Make Me Laugh”). Nonsocial movies contained dynamic objects with sound, unlike the social movies that had higher social content with ethnically and racially diverse human actors in the scenes. All the social movies depicted human actors. The language used by the actors was provided in English or Spanish depending on the child's primary language at home.
Demographic characteristics for neurotypical and autistic groups. The age (in months) at which participants received their diagnosis (ADOS-2): M=23.9, SD=4.5. The interval (in months) between the age at diagnosis and the app administration: M=0.7, SD=1.2. ADOS-2 Autism Diagnostic Observation Schedule-Second Edition. a Significant difference between the two groups based on ANOVA test. b Significant difference between the two groups based on Chi-Square test.
Estimation of ‘facing forward’ and blink rate variables. We first used CVA to determine the amount of time the child's face was oriented toward the screen of the device (‘facing forward’). A face detection algorithm32 was used to capture the child's face in each frame of the recorded video. In order to track only the participant's face and ignore all other faces in the frame, we performed a semi-supervised face detection algorithm (for details, see Refs. 16,26). Subsequently, we extracted 49 facial landmark points comprising 2D-positional coordinates33 that were time-synchronized with the movies. Using the facial landmarks, for each frame, we computed the child's head pose angles relative to the tablet's frontal camera such as θyaw (left-right), θpitch (up-down), and θroll (tilting left-right) (as described in Ref.34).
Facing forward. A child's orientation towards the screen, i.e. ‘facing forward’ during any given frame was defined using their (i) head pose angle, (ii) eye gaze, and (iii) rapidity in head movement. The child's head pose |θyaw|<25° was used, acting as a proxy for attentional focus on the screen, consistent with our previous work27,34, which is supported by the central bias theory for gaze estimation35,36. Then, for each frame, we checked if the estimated gaze of the participant was on the tablet's screen and if their eyes were open. The participant's gaze information was extracted using an automatic gaze estimation algorithm based on a pre-trained deep neural network16,37. Finally, we excluded the frames where the head was moving rapidly (this can lead to errors in the CVA). To this end, we first performed smoothing of the head pose signal θyaw, obtaining θyaw′. The head was considered to be moving rapidly if at any point θyaw′ of the current frame was >150% of the previous frame. Finally, the total facing forward variable (TFF) was estimated as a percentage of frames ‘facing forward’ out of the number of frames for each movie (ranging between 0 and 100). Details on the algorithm are presented in the supplementary materials, Algorithm S1.
Blink rate. We estimated the participant's number of blinks while they were watching each of the presented movies, as described next. OpenFace, a facial analysis toolkit38 that offers facial action units on a frame-by-frame basis, was used. These action units are based on the standard facial action coding system39. For the blinking action, we used action unit 45 (AU45) to estimate the participant's blinks. A smoothing of the AU45 time-series signal was performed, followed by detecting the number of peaks, which are associated with blink actions (see supplementary materials, Algorithm S2). To obtain the blink rate (blink rate), we normalized the number of blinks with respect to the number of valid frames. The valid frames were defined as frames during which the participant was (i) ‘facing forward’ (see above) and (ii) the confidence outcome of the OpenFace was at or above the recommended threshold (i.e. 0.75)38.
Social attention variable using eye gaze estimation. The “Spinning Top” and “Blowing Bubbles” stimuli had equally spatially halved representations of social (actor/actress) and nonsocial (toys/bubbles) components on the right or left side of the screen (see
Statistical analysis. A 2×2 mixed ANOVA was used to estimate the main effects due to (i) participant group and (ii) movie type (social and nonsocial) and their interaction effects via the Python method pinguouin. mixed_anova from Pingouin package version 0.5.240. The Mann-Whitney U test was used to estimate the statistical significance between the groups, using Python method pingouin.mwu. Within group comparisons were performed using the Wilcoxon signed-rank test using pingouin.wilcoxon. The statistical power was presented with effect size, ‘r’ for pingouin.mwu and pingouin.wilcoxon, and ‘ηp 2’ for ANOVA. Additionally, analysis of covariance (ANCOVA) using pingouin.ancova was performed to determine the influence of covariates. To assess the contribution of the three attention features (TFF, blink rate, and MGPS) either individually or in combination to distinguish the autistic and neurotypical groups, we used a linear logistic regression from sklearn Python package version 0.23.241. The classification performance was compared using the area under the curve of the receiver operating characteristic considering leave-one-out cross-validation42. Using the Hanley and McNeil method43, we have presented the 95% confidence interval (CI).
Analysis of participants who ‘faced forward’ for 80% of the movie duration. An analysis of blink rate was conducted to determine whether similar results are obtained when we only include participants who faced the screen in Figure (total facing forward, TFF) 80% of the time. As shown in Figure S1 and Figure S2, results were consistent with those in
Table S1 presents the mean and standard deviation of the percentage of valid frames used for the blink rate computation for each of the groups. Similarly, Table S2 indicates the raw blinks quantities for both the groups. When comparing the two groups for each different movie using a statistical test (Mann-Whitney U test), both the (i) percentage of valid frames and (ii) raw blink quantity had similar statistical significance as blink rate.
Results for children with language delay/development delay (LD-DD). Table S3 shows the details of all the participants (neurotypical, autistic, and LD-DD). Figure S3 shows the mean total facing forward (TFF) and mean blink rate for the social and nonsocial stimuli. The distribution of the LD-DD group appears to have similar attentional patterns as that of neurotypical group, unlike the autistic group, indicating the potential specificity of the proposed CVA-based measures for autism. Results for the TFF and blink rate for individual movies are presented in Figure S4. The statistical results for the individual movies in Figure S4 (P-value and effect sizes) are presented by comparing the autistic and LD-DD groups only. Overall, the distribution of the LD-DD group was observed to be different from the autistic and similar to the neurotypical group.
12 (74.42%)b
1 (0.24%)
6 (1.44%)
43 (10.33%)
6 (13.95%)
0 (0.00%)
7 (16.28%)
9 (2.16%)
0 (0.00%)
10 (66.67%)b
0 (0.00%)b
4 (0.96%)
aSignificant difference between the two groups based on ANOVA test.
bSignificant difference between the two groups based on Chi-Square test.
The disclosure of each of the following references is incorporated herein by reference in its entirety to the extent that it is not inconsistent herewith and to the extent that it supplements, explains, provides a background for, or teaches methods, techniques, and/or systems employed herein. The numbers below correspond to the superscripted numbers in EXAMPLE 6.
It will be understood that various details of the subject matter described herein may be changed without departing from the scope of the subject matter described herein. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation, as the subject matter described herein is defined by the claims as set forth hereinafter.
This application claims benefit of and priority to U.S. Provisional Patent Application Ser. No. 63/523,761, filed Jun. 28, 2023, the contents of which is herein incorporated by reference in its entirety; and claims benefit of and priority to U.S. Provisional Patent Application Ser. No. 63/523,803, filed Jun. 28, 2023, the contents of which is herein incorporated by reference in its entirety.
This invention was made with government support under grant nos. HD093074, MH121329, and MH120093 awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63523803 | Jun 2023 | US | |
63523761 | Jun 2023 | US |