Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder/condition in which individuals experience difficulties in social communication and interaction and exhibit limited or repetitive behaviors and interests. Additionally, autistic individuals may have alternative learning styles, movements, and attention patterns. Several studies have consistently shown that ASD is more commonly found in males than females, with an approximate ratio of 3 to 1. One of the approaches used to investigate neurodivergence associated with ASD is Functional Connectivity (FC) analysis of functional Magnetic Resonance Imaging (fMRI) data. FC analysis helps to examine statistical dependence between the activity of different brain regions based on their blood oxygenation levels measured by fMRI. Hence, FC represents the extent to which various brain regions exhibit synchronized activity over a period of time, which is commonly believed to be representative of the structural and functional organization of the brain.
Functional Connectivity (FC) studies in ASD have led to the development of two main theories about the connectivity of the brains of individuals with ASD: under-connectivity and over-connectivity. Under-connectivity is defined as a decrease in brain activity between brain regions compared to a neurotypical population. Conversely, over-connectivity is understood as higher statistical correlations between different areas of the brain appearing in affected individuals compared to unaffected individuals. Finally, as more recent studies indicate, it is more likely that both over- and under-connectivity patterns are present in the brains of individuals with ASD.
Traditional methods for FC analysis include Seed-Based Correlation Analysis (SCA) [1], Independent Component Analysis (ICA) [2], graph theory-based analysis [3], clustering-based approaches [4], dynamic connectivity analysis [5], Granger causality analysis [6], and dynamic causal modeling [7]. While these approaches have helped uncover neurodivergent patterns in fMRI data, they entail certain limitations, such as inherent biases or limited interpretability. Several inconsistencies have been reported in studies using these methods when examining functional connectivity patterns in fMRI in ASD. The discrepancies are mainly attributed to the varied age and sex compositions within the study samples and the diverse nature of ASD. Notably, an apparent trend of under-representation of females with ASD in FC studies of fMRI can be seen. Limited interpretability arises from technical constraints inherent to each of these methods.
This disclosure is structured as follows: we discuss the pertinent literature on traditional FC methods and the utilization of Variational AutoEncoder (VAEs) (
Various methods have been developed to examine brain functional connectivity using fMRI data [16], which includes Seed-Based Correlation Analysis (SCA) [1], independent component analysis (ICA) [2], and graph-theory based analysis [3]. SCA involves selecting a Region of Interest (ROI) and computing its correlations with other brain regions over time series. High correlations indicate over-connectivity, and low correlation under-connectivity. However, SCA can potentially introduce bias due to ROI selection, overlooking important connectivity patterns outside the chosen regions [11]. On the other hand, ICA is a data-driven, multivariate method that decomposes fMRI data into spatially independent components, each representing a unique spatial pattern associated with a distinct time course [2,12]. ICA has been applicable in revealing lower-level spatial and temporal patterns in brain connectivity. Nevertheless, the drawback of ICA analysis is that the signal from a single brain region may appear in multiple components within lower-dimensional space, complicating the identification of high-level correlations.
Graph theory provides a framework for investigating local and global connectivity patterns. However, effectively capturing the temporal dynamics inherent in fMRI data presents a significant challenge. More advanced traditional approaches to Functional Connectivity Analysis (FC) include clustering-based approaches [4], dynamic connectivity analysis [5], Granger causality analysis [6], and dynamic causal modeling [7]. Most studies using traditional methods have focused on male fMRI data with ASD, and there has been a lack of research specifically exploring females with ASD. When the dataset is imbalanced, SCA, ICA, and graph-based analyses face several challenges. For example, SCA is often used to compare connectivity patterns between different subgroups; thus, an imbalance in the studied data can influence the statistical power and robustness of the comparisons. In ICA, while the analysis is not inherently affected by class imbalance, subsequent classifiers that use ICA-derived features may favor the majority class, affecting classification performance. In graph-based methods, graph construction could also be hindered by the greater presence of certain populations. Therefore, there is a need for an approach that encompasses both the spatial and temporal distribution of the data and is robust to under-representations in the dataset.
The most closely related is the paper by Zuo et al., in which the researchers utilized a disentangled VAE to identify structural and functional connectivity differences between control, individuals with early mild cognitive impairment (MCI), and individuals with late mild cognitive impairment [18]. Using a graph convolutional VAE, researchers have identified under- and over-connectivity patterns associated with the progression of MCI. Likewise, another study by Choi et al. applied a Deep Neural Network (DNN)-based VAE to analyze connectivity patterns in ASD [19]. The study has also presented under- and over-connectivity patterns correlated with the full-scale IQ scores.
A considerable number of encoder and decoder architectures have been studied in the application of fMRIs, which vary depending on the main objective of the application. However, the most common architectures include convolutional layers (CNN), recurrent layers (RNN), and a combination of the two in sequence and parallel. CNN layers have proven to be helpful in identifying spatial correlations; however, the temporal patterns of the decoded data are not meaningful since the convolution is not capable of capturing the temporal dynamics. And vice versa, recurrent layers have shown to have better temporal feature extraction, but spatial patterns could not be well preserved. Therefore, we believe that there is a need to evaluate different model architectures.
The most commonly studied brain networks in ASD include Default Mode Network (DMN), limbic, visual, somatomotor, and salience networks. The regional components of each of these networks have a tendency to slightly change study by study. The DMN is a large-scale brain network that is most active during rest periods or when the mind is wandering [26]. It is involved in various cognitive processes such as self-thinking, episodic memory recovery, and social cognition [26]. In most studies, the DMN includes regions such as the medial prefrontal cortex, the posterior cingulate cortex, and the medial temporal lobes. The limbic network is a group of interconnected structures that play a critical role in emotion, motivation, and memory processing [27]. The limbic network is closely associated with the management of emotional responses, the processing of reward and punishment, and the formation and recovery of memories. Key structures in the limbic system include the amygdala, hippocampus, and cingulate gyrus [28]. The visual network is responsible for processing visual stimuli, and its nodes are located primarily in the occipital lobe [29]. The somatomotor network is involved in the planning, enactment, and management of voluntary movements. It includes the primary motor cortex, the supplementary motor area, and the primary somatosensory cortex, all located in the frontal and parietal lobes. Finally, the salience network is a large-scale brain network that is involved in catching and focusing attention to relevant internal and external stimuli [30]. Key regions within the salience network include the anterior insula and the dorsal anterior cingulate cortex [31].
Previous findings suggest that under-connectivity between various brain networks is associated with social impairments and deficits observed in ASD. Most under-connectivity patterns were associated with DMN, including decreased interconnectivity between DMN-limbic, DMN-visual, and DMN-somatomotor. For example, in the study by Abrams et al., the researchers reported under-connectivity between DMN (pSTS with orbitofrontal, temporal lobe) and limbic networks (amygdala), suggesting that ASD individuals experience a less pleasant response to human voice processing [32]. Under-connectivity between the DMN (Precuneus (PrC)) and the visual cortex has also been previously reported [33]. However, the study reported that this under-connectivity pattern was not found to be related to socio-behavior deficits. Finally, under-connectivity between DMN and several regions in somatomotor has also been reported in multiple studies [34,35].
Over-connectivity patterns are primarily associated with salience networks. For example, a study by Green et al. has demonstrated the over-connectivity of the salience network with sensory processing areas, such as the visual and limbic networks, in individuals with ASD. It is believed that this over-connectivity may contribute to heightened responsiveness to irrelevant stimuli and deficits in social interactions [36]. DMN-salience network was shown to have higher interconnectivity in ASD subjects compared to Typically Developing (TD) in work by Yerys et al. [34], which has been hypothesized to be attributed to the ability to switch between intra-person and extra-person processing.
A handful of studies specifically looked into the difference between female and male functional connectivity. One of the few studies of specifically sex-related differences revealed that commonly associated DMN hypoconnectivities are primarily present in male populations [37]. Increased connectivity in the female population compared to males has also been supported by the studies by Lawerence et al. and Smith et al. [39].
Examples/papers on diagnosis based on FC analysis:
Examples/papers on progress monitoring based on FC analysis or fMRI data include Binnewijzend, Maja A A, et al. “Resting-state fMRI changes in Alzheimer's disease and mild cognitive impairment.” Neurobiology of aging 33.9 (2012): 2018-2028. And. Yang, Fu-Chi, et al. “Altered hypothalamic functional connectivity in cluster headache: a longitudinal resting-state functional MRI study.” Journal of Neurology. Neurosurgery & Psychiatry 86.4 (2015): 437-445.
To address the issues of limited interpretability and under-representation, a system and method is provided with a novel approach to FC analysis of fMRI data using Variational AutoEncoders and Conditional Variational AutoEncoders. The Variational AutoEncoder (VAE) is a deep generative model that learns to encode data into a low-dimensional latent space and then decodes low-dimensional features back to the original data [8]. The Conditional Variational Autoencoder (CVAE) is an extension of the standard VAE, which incorporates conditional information, such as additional class features or attributes, into the generative model to enable targeted data synthesis [9]. This disclosure applies three different VAE architectures for FC analysis for individuals with ASD. We then apply phenotypic data to VAEs to reduce sex-related bias. For a more quantitative and structured analysis, we have employed three commonly used VAE architectures in the fMRI domain: Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and a hybrid model combining CNN and RNN in parallel. Our evaluation of the VAE and CVAE includes comparing the performance in the reconstruction of neurotypical samples and the efficacy in conducting FC analysis for fMRI samples of individuals with ASD. Our evaluation compares the identified FC divergences between female and male populations for both VAE and CVAE. We aim to provide a structural and systemic investigation with diverse AE architecture variations in the fMRI domain, specifically addressing the issues of dynamic processing of highly complex brain imaging data and sex under-representation with statistical modeling.
Current practice is the doctor's visual assessment or traditional machine learning algorithms with statistical analysis models. This IP uses generative AI to model resting state fMRI signals and then detects changes using the generative models to analyze connectivity changes, taking into account other features such as age or gender.
The present disclosure Computational model to process fMRI data for neural connectivity analysis and ASD diagnosis/progress estimations.
These and other objects of the disclosure, as well as many of the intended advantages thereof, will become more readily apparent when reference is made to the following description, taken in conjunction with the accompanying drawings. This summary is not intended to identify all essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter. It is to be understood that both the foregoing general description and the following detailed description are exemplary and are intended to provide an overview or framework to understand the nature and character of the disclosure.
In describing the present disclosure illustrated in the drawings, specific terminology is resorted to for the sake of clarity. However, the present disclosure is not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents that operate in a similar manner to accomplish a similar purpose.
Turning to the drawings,
Thus, with reference to
As shown in
The CNN encoder 113 and RNN encoder 114 are connected to the Fully Connected input of a Fully Connected layer 119, 120, respectively, of the CVAE Decoder block. The Fully Connected layer 119 input receives the third CNN encoder fMRI image data output and the third RNN encoder fMRI image data output.
The CNN decoder 116 and RNN decoder 117 are connected to the output of the Fully Connected layers 119, 120, respectively, of the CVAE Decoder block (that includes the CNN decoder 116 and the RNN decoder 117 to generate the synthetic fMRI data output.
The CNN decoder 116 is comprised of the Fully Connected layer 119 followed by transposed convolution layers with 128, 64, and 32 filters (three layers in the embodiment of
The RNN decoder 117 is connected in parallel with the CNN decoder 116. The RNN branch encoder 114 contains three unidirectional Long-Short Term Memory (LSTM) layers followed by a Fully Connected layer 118. The RNN decoder 117, includes a Fully Connected layer 120 followed by three LSTM layers. Latent features are fused using element-by-element multiplication. The RNN decoder 117 is connected to the latent Z space 130 output, receives latent space vector encoded from the CVAE encoder layer (the third decoder output) that has gone through the dimension reduction process from the encoder part from the input fMRI data and the conditional embeddings 131 (e.g. age, sex, and neurodivergent label), and generates a RNN decoder synthetic data output. The total number of latent features in the latent Z space 130 is equal to 2000.
The outputs from the CNN decoder 116 and the RNN decoder 117 are connected to a Fully Connected layer 118 of the CVAE decoder output layer. The Fully Connected layer 118 produces synthetic fMRI data. For example, the CVAE 115 (conditioned on sex, age, and neurodivergence subgroup label) can be applied to fMRI data that has been mapped to a predefined brain atlas. The CVAE 115 can be used to generate neurodivergent synthetic fMRI data corresponding to the conditions of sex, age and neurodivergence subgroup label using the CVAE model. In addition, the CVAE 115 can be used to generate at the processing device, non-neurodivergent synthetic fMRI data corresponding to sex and age conditions using the CVAE model. The output from the CVAE 115 can then be used for functional connectivity analysis to identify differences between the neurodivergent synthetic fMRI data and the non-neurodivergent synthetic fMRI data to identify neurodivergences in the patient fMRI data. Of course, other suitable configurations for the CVAE 115 can be utilized, within the spirit and scope of the present disclosure. In some embodiments, a VAE can be utilized.
Accordingly, starting at step 202, the fMRI sample data is input by the image source 102 and optionally stored in the image storage 112. At step 204, the processor 110 maps the input fMRI image data, for example based on Schaefer's brain map. An atlas is what is used to get a brain map, parses the fMRI brain image data into specific regions, and is used here reduce the data dimensionality. The raw fMRI signals (sample signals shown in the middle of
At step 206, the processor 110 creates training and testing datasets, which compromises of neurodiverse brain signals (e.g., brain signals from ASD and TD subjects), and conditional embedding per sample (sex, age, subgroup). The training and testing data sets are randomly selected from the total dataset to provide balanced training and proper evaluation. The conditional embeddings 131 are specific to each subject, so they constitute part of the conditional variable inputs to the CVAE.
At step 208, the processor 110 trains the CVAE to generate gender, sex and subgroup specific samples using training dataset (both ASD and TD samples).
At step 210, the processor 110 evaluates reconstruction and generative abilities on the testing dataset to validate the reconstruction rate of synthetic brain signals. The reconstruction rate is calculated with generative loss in comparison to the input brain signals.
At step 212, the processor 110, at the decoder 116, 117 (
At step 214, the processor 110, from the outputs of the decoder 116, 117, computes correlation matrix between different networks for input and synthetically generated outputs for all the samples in the testing subset. The correlation matrix explains the OC and UC as visualized in
At step 216, the processor 110 calculates the statistical differences between the synthetically generated output and input. In the
At step 218, the processor 110 displays the results of the neurologist in a form of a chord diagram on the display device 104, as shown in
On the clinical front, this system 100 and method 200 significantly enhances early diagnosis and intervention strategies by providing more accurate and data-driven insights into ASD. Clinicians can leverage the model's predictions to identify atypical neurological patterns, enabling them to tailor treatment plans more effectively and improve the overall quality of care for individuals on the autism spectrum. Moreover, the system can streamline the diagnostic process, reducing the time and cost associated with traditional assessment methods.
This ML system also opens new avenues for research, pharmaceutical development, and therapeutic interventions. Pharmaceutical companies can utilize the insights generated by the system to identify potential targets for drug development, ultimately leading to more targeted and effective treatments for ASD. Additionally, the system can aid in market segmentation and the development of specialized products and services to better meet the unique needs of individuals with ASD and their families, thus fostering innovation and growth in the commercial sector.
The disclosure includes many illustrative embodiments, some of which are mentioned here. In a first embodiment 1, a computer-implemented method is provided for identifying neurodivergences in individuals using functional fMRI data, comprising: mapping fMRI imaging data to predefined brain atlas; training a conditional variational autoencoder (CVAE) model using the extracted features, wherein the CVAE model conditions on sex, age, and subgroup label; generating desired healthy-like synthetic fMRI samples using the trained CVAE model; and performing functional connectivity analysis on the synthetic fMRI samples to identify neurodivergences in individuals with ASD. In a second embodiment, the functional connectivity analysis includes calculating correlation matrices based on the synthetic fMRI samples; and applying Welch t-test to the correlation matrices to identify significant differences in functional connectivity patterns between individuals with ASD and synthetically generated healthy-like samples.
In a third embodiment, the CVAE model is further configured to optimize latent space representations of the fMRI data to enhance the separability of individuals with ASD and healthy individuals. In a fourth embodiment, the CVAE model is trained using a deep learning architecture comprising multiple layers of neural networks. In a fifth embodiment, a computer system is provided for identifying neurodivergences in individuals, comprising: a data collection module for collecting fMRI data from individuals; a feature extraction module for extracting features from the fMRI data; a conditional variational autoencoder (CVAE) model trained to condition on sex, age, and diagnosis, configured to generate synthetic fMRI samples; a functional connectivity analysis module for performing functional connectivity analysis on the synthetic fMRI samples to identify neurodivergences in individuals. In a sixth embodiment, a non-transitory computer-readable storage medium has instructions that, when executed by a computer, cause the computer to perform the method of the first through fifth embodiments.
The ABIDE-I (Autism Brain Imaging Data Exchange) dataset is a publicly available, large-scale collection of resting-state fMRI data of individuals with ASD [46]. The ABIDE-I dataset consists of 1035 rs-fMRI scans, including 505 individuals with ASD and 530 neurotypical control subjects. The data were collected from 17 different imaging sites, each with its own scanning protocol. The dataset has undergone various preprocessing steps, including motion correction, spatial normalization, and noise reduction, to ensure uniform data quality and comparability across different sites. However, different imaging sites had different default fMRI scanners; therefore, Repetition Time (TR), Echo Time (TE), and flip angle degree are varied across sites. The subset of scans with TR of 2000 (ms) from the ABIDE-I dataset has been extracted. Thus, for the present system, we have only used data samples collected from 9 out of 17 sites, resulting in 236 ASD samples, 276 typically developing samples. The subjects were then randomly split into training and testing sets. The training and testing sets consisted of 231 control and 235 neurodivergent samples and 35 and 41 samples, respectively. In
The Autoencoder (AE) (
In the present system, the VAE is deployed as a deep generative model using different architectures of the encoder g(x; φ) and the decoder f (z; θ). The encoder learns to compress the high-dimensional input (parcels versus time matrix) x into lower-dimensional latent representations z, and φ and θ are both hyperparameters of the networks. The VAE aims to learn a model for the true data distribution, denoted by p(z, x). The latent space dimensionality is denoted as d (i.e., z∈d). The variational posterior distribution is denoted by q(z, x), which is an approximation of the true posterior. The network is trained using the Evidence Lower Bound (ELBO) loss, having the reconstruction and KL divergence terms. The reconstruction term aims to ensure that the VAE can accurately reconstruct the input data, which is represented as the expected negative log-likelihood log p(x|z), where p(x|z) is modeled by the decoder part of the VAE. The KL divergence term is used to make the variational posterior distribution, q(z|x), as close to the prior distribution, p(z), as possible.
The ELBO loss, denoted as LELBO(x), can be written as:
During training, the encoder network g(x;φ) models the variational posterior distribution q(z|x). The encoder outputs the parameters of a Gaussian distribution, {tilde over (μ)} and log {tilde over (σ)}2, which represent the mean and log-variance of the latent space distribution, respectively. Sampling from q(z|x) allows us to generate new data samples similar to those present in the training data distribution.
The system uses a CVAE for a more controlled fMRI sample reconstruction, as an example architecture shown in
Specifically, the generative process of the CVAE takes the form
And the ELBO loss can then be written as:
In the CVAE model, the reconstruction of a sample is dependent on the given set of input conditions. To generate a TD-like output for an atypical sample, the conditional variable must be adjusted to a control condition while retaining the remaining conditions unchanged. Consequently, when calculating the discrepancy between the atypical input and the reconstructed output, the difference is assumed to be solely attributed to the modified conditions. This ensures that the identified divergence depends exclusively on the altered conditional variable.
Three commonly used VAE architectures in the fMRI domain were trained to learn a compact representation of the data from neurotypical control fMRI samples. A Convolutional Neural Network (CNN) variational autoencoder 113, 116, 115, Recurrent Neural Network (RNN) variational autoencoder 114, 117, 115, and a hybrid of CNN and RNN VAEs in parallel 113, 114, 116, 117, 115 (
A detailed summary of the structures of VAEs is shown in
All three VAEs have 2000 latent features extracted by the encoding part (d=2000), and the latent space was modeled using a mixture of Gaussian assumptions. Furthermore, all VAEs were optimized using the Adam algorithm with a learning rate of 0.0001. In the context of the CVAE, all the architectures of the models remain the same; however, the phenotypic data embedding is incorporated by concatenating it with both the input of the encoder and the input of the decoder. The embedding dimensionality is specifically set to 200, allowing for concatenation as another parcel feature to the input matrix, resulting in a total dimensionality of 201×200. Concatenation to the latent vector z resulted in the dimensionality of 2200. It is important to note that for the training of VAEs, only a neurotypical sample has been used; however, due to the conditional embedding, the CVAE allows for training on both neurotypical and neurodivergent samples. All of the systems that are reported in this disclosure were performed on the server that contains an NVIDIA RTX 3090 running CUDA version 10.2 and PyTorch 1.13.1+cu117 [43]. This is the first system in the fMRI domain comparing different encoding and decoding architectures.
Evaluation of VAE performance consisted of analysis of the reconstruction of the neurotypical samples, analysis of latent space features, and analysis of the regeneration abilities of the decoder.
Upon completion of the training, assessment of the VAE and CVAE reconstruction abilities involved three evaluation methods. The cosine similarity score was computed to capture the overall resemblance between the input and the reconstructed output. However, cosine similarity does not explicitly account for positional information. Thus, Pearson's correlation coefficient (R, PCC) was additionally calculated for the validation subsets of the data. Finally, the difference between the input and decoded output was evaluated through L1 (Mean Absolute Error (MAE)). L1 quantified the average absolute difference between the reconstructed BOLD signal intensity and the intensity of the original signal. To compute the L1 error, we leveraged the validation samples of the subgroup present during the training phase. We believe that a combination of these metrics will help us quantify the ability of VAEs and CVAEs to reconstruct samples from lower-dimensional data within the validation dataset.
To assess the encoding abilities of each model, we encode both populations and conduct a comparative analysis of their latent representations. To determine the statistical significance of the differences in the encoding feature, a two-sided t-test is employed (p<0.05). The null hypothesis is that the mean of the neurotypical subgroup is equal to the mean of the neurodivergent. It is believed that the optimal encoder architecture will have a pronounced distinction in the latent space, meaning that the encoder learned to extract meaningful features from the input samples. Consequently, our objective is to reject the null hypothesis in favor of the alternative hypothesis, which is that the mean latent representations of the TD and ASD groups are different.
Evaluating the performance of accuracy of synthetic data outputted by VAEs poses a significant challenge, especially when the ground-truth effects are unknown in real data. Therefore, to provide an initial assessment of atypical pattern detection, we calculate L1 of synthetic samples. In the context of VAE systems, where the model is trained on TD samples only, we formulate a hypothesis that the L1 error would be more pronounced when reconstructing ASD validation samples in comparison to the TD validation samples. For the CVAE systems, where model architecture accommodates training on both TD and ASD samples, synthetic outputs were generated for the ASD validation dataset with target conditional embedding of TD samples. Consequently, the L1 error is computed between the input ASD samples and the synthetically generated outputs.
In this system, we conducted FC analysis of the ASD subgroup alongside FC analysis for female and male populations within the ASD group. The FC analysis was performed using trained VAEs and CVAEs in three steps.
In VAE systems, we first processed each neurodivergent sample from the validation subset through all three architectures. We hypothesized that since VAEs were trained to reconstruct neurotypical samples only, the output of the neurodivergent sample from the decoding process would resemble the features of the training data (
For the CVAEs, the training data included both neurodivergent and neurotypical data, which allows for a more targeted generation of the synthetic output. The overall steps for FC with CVAEs were similar to those with VAEs, but the input embedding of the condition was adjusted to the desired output. For instance, if the input sample was a female with ASD, 12 years old, the embedding was adjusted to generate a neurotypical-like female, 12 years old, sample. The remaining FC analysis steps (grouping parcels 204, calculating pairwise connectivity 214, conducting two-sided Welch t-tests 216, and visualizing chord diagrams 218 are the same as with VAEs.
To explore sex-related neurodivergence, we performed separate analyses for female and male samples from the validation dataset. To assess the influence of the conditions on the FC results, we calculate cosine similarity between VAE and CVAE pairwise correlation matrix between networks (
As detailed in Section 2.6, we begin by evaluating the reconstruction performance of all VAEs and CVAEs. Upon visual inspection of
Table 1 is a summary of reconstruction performance of VAE systems: cosine similarity scores and PCC for the neurotypical samples in the validation dataset. The average L1 reconstruction error for both neurotypical and neurodivergent samples within the validation dataset is presented.
Table 2 is a summary of reconstruction performance of CVAE systems: cosine similarity scores and PCC for the neurotypical samples in the validation dataset. Additionally, the average L1 reconstruction error for validation neurodivergent samples and synthetically generated neurotypical-like samples.
To evaluate the encoding capabilities of each model, a comprehensive analysis was conducted on both neurotypical and neurodivergent samples from the validation dataset.
To further assess the performance of VAEs, we conducted a preliminary evaluation of atypical pattern detection by calculating the reconstruction error on both neurotypical and neurodivergent samples from our validation datasets, summarized in Table 1. The reconstruction L1 error for the ASD validation set is higher than that of the TD set. This difference implies that VAEs can reconstruct ASD samples in a manner that makes them resemble TD samples. For the CVAEs, we conducted a similar analysis. Given that the CVAE was trained on both ASD and TD samples, our approach involved computing the reconstruction L1 error for the ASD samples first. Subsequently, we compared this with the synthetically generated outputs, employing a target conditional embedding based on a TD sample. The results, presented in Table 2, show that the construction error for the synthetic samples exceeds that of the reconstructed ASD samples. This disparity serves as an indication that the conditioning mechanism is effective in detecting certain divergences within the data.
In the VAE systems (
One notable consequence of the dataset's bias is exemplified by the consistent trend of over-connectivity between the salience and limbic networks in the male population, which is reversed in females for all of the models (
In the CVAE systems (
Interpreting the chord plots and discerning the extent to which the CVAE mitigated sex-related influences presents a challenge. As outlined in Section 3.4, the identified neurodivergence in the CVAE is expected to have a lower correlation with sex labels compared to the VAE. To measure this, we quantitatively assess the similarity between the pairwise correlations underpinning these chord plots (Table 3). This similarity score revealed that all the conditional models have a higher overlap between male and female neurodivergence compared to the unconditional models. Conditional hybrid model had the highest values for the similarity between female and male pairwise correlation matrices, suggesting the most unbiased FC neurodivergence patterns in relation to sex. However, it is important to note that the CNN model compared to the rest of the models had the highest increase in similarity by adding conditional embedding, which is indicative of the fact that CNN layers are particularly sensitive to the inclusion of conditional embedding.
Table 3 shows the similarity between male and female FC pairwise correlations for VAE and CVAE systems.
In this system 100, we investigated the application of generative models to FC analysis in the context of ASD with fMRI data. Our exploration began with a comprehensive assessment of the reconstructive abilities of various VAE architectures, using neurotypical samples as the input data.
The CNN-based VAE and CVAE are more effective in reconstruction and conditional generation of synthetic data. This conclusion has been derived from the combination of the metrics and results provided in Section 3. First and foremost, cosine similarity and PCC measures for CNN VAE and CNN CVAE reconstruction are higher compared to the other models. Even though introducing phenotype data has improved both reconstruction in higher dimensional space and discrimination in lower-dimensional space for all of the models, it is worth highlighting that the CNN model has demonstrated superior reconstruction performance with conditioning, as evidenced by the highest observed increase in PCC by comparing Tables 1 and 2.
Secondly, in Table 3, the CNN model stands out with the highest increase in similarity between female and male connectivity, almost doubling the improvement seen in other models. These findings collectively indicate that the CNN model exhibits heightened sensitivity to conditioning mechanisms in comparison to other models.
One plausible explanation for this could lie in the way the condition is introduced to the model. In all models, conditional embedding is concatenated as another feature, resulting in the absence of explicit temporal ordering within conditional embeddings. Consequently, CNN demonstrated superior performance in handling this conditioning mechanism when contrasted with the RNN model. We interpret this as it is more effective for the VAE to model spatial patterns rather than temporal ones. We believe that unconditional CNN in parallel with RNN is better for classification applications.
As shown in
To provide initial validation for the decoder architectures, we calculated the Mean Absolute Error for the reconstruction of the subgroup that was present during the training and the new sample subgroup. VAEs had higher reconstruction errors for ASD samples compared to TD samples, indicating their ability to model ASD samples resembling TD ones. For CVAEs, which were trained on both ASD and TD samples, we computed reconstruction loss for ASD samples. Comparing this loss of synthetically generated outputs using a TD-based target conditional embedding, we found higher reconstruction errors for synthetic samples. This finding also suggests the conditioning mechanism effectively detects neurodivergence and can make the generation process more targeted. Moreover, in comparison to the work of Kim et al. [22], our VAE models demonstrate better performance in data reconstruction, showcasing a lower range of reconstruction errors ranging from 0.06 to 0.07, while Kim et al. reported a range of 1.2 to 1.7.
Next, we proceeded further to FC analysis with trained VAEs and CVAEs. We consistently identified under-connectivity between the limbic and DMN networks across most VAE systems, which is consistent with previous findings in the literature. The trend of higher connectivity between salience and limbic networks in the male population compared to female has been identified by all VAEs and CVAEs. In the study by Green et al. [42], where the studied group consisted predominantly of the male population, they concluded that male individuals tend to exhibit over-connectivity between salience and limbic networks. However, we extend these findings and show this trend does not hold true in the female population.
One of the findings in the previous literature is that males tend to have decreased under-connectivity with the DMN network compared to females [44]. Based on our analysis, both the VAE and CVAE revealed this pattern as well, specifically between DMN and somatomotor networks. Due to the limitations of Schaefer's atlas, we have focused on exploring network connectivity. However, in future works, we aim to expand the investigation to other atlas configurations and within network analysis. All of the above are consistent with the connectivity patterns that have been reported previously in other literature, summarized in Section 2.4.
It was hypothesized that adjusting conditional embedding would reduce sex-related bias in the models and potentially result in sex-independent FC. By evaluating the pairwise connectivity matrix overlap between female and male subgroups, it is concluded that patterns discerned through CVAE have reduced correlation with sex labels. We believe the remaining differences shown in the chord plot between male and females in CVAE systems are primarily due to the age difference and diverse nature of the disorder.
In recent years, many studies have explored the capabilities of generative models (including VAEs, GANs, and Diffusion flow models) in the medical domain. However, many models are found to struggle with at least one of the following: high-quality outputs, mode coverage, sample diversity, and computational costs [50]. VAEs are probabilistic models, which makes them well-suited for modeling and generating complex distributions. As shown in this disclosure, VAEs can learn the underlying probability distribution of the input data, allowing for probabilistic sampling and interpolation. However, as stated in previous works, VAEs tend to suffer from comparatively low quality in generation compared to GANs or Diffusion flow models [50]. Therefore, our future work will also investigate different generative frameworks to improve the quality of generated samples and develop methods for assessing them.
The present system and method present a novel approach to FC analysis of fMRI data using a generative model such as the VAE. We also determined if introducing additional phenotype data to the model would reduce bias and increase the generalizability of the FC analysis. Our main finding includes that the CNN-based model 113, 116, 115 has been shown to be the most effective architecture for the FC analysis, as it showed superior performance in reconstruction with and without conditional information. We show that introducing phenotypic data as conditional embeddings to the model generally improves reconstruction performance and reduces bias in FC analysis. Conditioning of the CNN model has shown to have the most effect on the results; however, the CNN model parallel with RNN (
The following abbreviations are used in this manuscript:
The following references are hereby incorporated by reference.
In the embodiments shown, a processing device can be provided to perform various functions and operations in accordance with the disclosure. The processing device can be, for instance, a computer, personal computer (PC), server or mainframe computer, or more generally a computing device, processor, application specific integrated circuits (ASIC), or controller. The processing device can be provided with, or be in communication with, one or more of a wide variety of components or subsystems including, for example, data processing devices and subsystems, wired or wireless communication links, user-actuated (e.g., voice or touch actuated) input devices (such as touch screen, keyboard, mouse) for user control or input, monitors for displaying information to the user, and/or storage device(s) such as memory, RAM, ROM, DVD, CD-ROM, analog or digital memory, flash drive, database, computer-readable media, floppy drives/disks, and/or hard drive/disks. All or parts of the system, processes, and/or data utilized in the system of the disclosure can be stored on or read from the storage device(s). The storage device(s) can have stored thereon machine executable instructions for performing the processes of the disclosure. The processing device can execute software that can be stored on the storage device. Unless indicated otherwise, the process is preferably implemented automatically by the processor substantially in real time without delay.
As used herein, when an element or feature is described as being “configured,” that element or feature is structurally arranged or formed to accomplish the stated purpose. As used with respect to a processing device (e.g., computer), the term “configured” or “configured to” means that the processing device is structurally arranged or ordered (e.g., by supplying, arranging or connecting a specific set of internal or external components or modules, for example that perform certain operations) to accomplish the stated purpose or task.
The description and drawings of the present disclosure provided in the disclosure should be considered as illustrative only of the principles of the disclosure. The disclosure may be configured in a variety of ways and is not intended to be limited by the preferred embodiment. Numerous applications of the disclosure will readily occur to those skilled in the art. Therefore, it is not desired to limit the disclosure to the specific examples disclosed or the exact construction and operation shown and described. Rather, all suitable modifications and equivalents may be resorted to, falling within the scope of the disclosure.
This application claims the benefit of priority of U.S. Provisional Application No. 63/539,176, filed Sep. 19, 2023, the entire content of which is relied upon and incorporated herein by reference in its entirety.
This invention was made with Government support under Grant/Contract No. 170774 awarded by NSF. The Government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
63539176 | Sep 2023 | US |