The present disclosure provides methods and systems related to artificial intelligence for early recognition of rare skin diseases. In particular, the present disclosure provides methods and systems for early recognition of squamous cell carcinoma in epidermolysis bullosa and methods and systems for treating and/or preventing SCCs.
Recessive dystrophic epidermolysis bullosa is a rare hereditary skin condition. The skin is fragile, minor injuries lead to blisters and wounds, and patients suffer from frequent pain and infections. As a severe complication, patients with recessive dystrophic epidermolysis bullosa (RDEB) develop skin cancer, particularly squamous cell carcinoma (SCC). SCCs in RDEB occur at a young age, are aggressive, and there is no available non-invasive tool for detection of SCCs. Early discovery is key, but recognizing SCC is difficult. Tools are needed to analyze photographs of RDEB skin and to identify SCCs.
Embodiments of the present disclosure include a computer-based method comprising a) imaging a lesion on human skin selected from the group consisting of: (i) epidermolysis bullosa (EB) skin; (ii) non-EB skin; (iii) squamous cell carcinoma (SCC) skin; (iv) non-SCC skin; (v) EB SCC skin; and (vi) non-EB SCC skin; to provide an imaged human skin; b) annotating the one or more lesion of the imaged human skin; c) assessing a potential for a disease of the imaged human skin; and d) choosing a therapy for the imaged human skin.
In some embodiments, the method further comprises a smartphone application.
In some embodiments, the assessing a potential for a disease of the imaged human skin comprises collection of electronic health data.
Embodiments of the present disclosure also include a computer-based system comprising a) a one or more imaging device; b) a one or more communication component) a one or more memory component; d) a one or more information processing component configured to analyze data; e) a protocol component; wherein the computer-based system (i) assesses a potential for a disease of the imaged human skin; (ii) chooses a therapy for the imaged human skin; and (iii) performs a one or more action on the imaged human skin.
In some embodiments, the one or more communication component comprises an input device and an output device.
In some embodiments, the protocol component comprises one or more protocols for: (i) imaging a processed human skin to provide an imaged human skin; (ii) annotating a one or more lesion of the imaged human skin; (iii) assessing a potential for malignancy of the imaged human skin; and (iv) choosing a therapy for the imaged human skin.
In some embodiments, the one or more information processing component or the protocol component alters protocols based on responses from a software.
In some embodiments, the assessing a potential for a disease of the imaged human skin comprises collection of electronic health data.
Embodiments of the present disclosure also include a method of treating a subject who is suffering from skin condition, the method comprising providing the subject with the computer-based system comprising a) a one or more imaging device; b) a one or more communication component) a one or more memory component; d) a one or more information processing component configured to analyze data; e) a protocol component; wherein the computer-based system assesses a potential for malignancy of the imaged human skin and performs actions on the imaged human skin.
In some embodiments, provided herein are methods of developing an artificial intelligence (AI) tool for distinguishing between two medical conditions, the method comprising: (a) training the AI tool on a large-scale hierarchical image database to generate an initially trained model with robust non-specific feature-detection capabilities; (b) adapting the initially trained model to the medical conditions by transfer learning to binary classification of images of the two medical conditions to generate model trained on the two medical conditions; (c) combining output of the del trained on the two medical conditions with patient clinical data to generate a combined model; and (d) training a Random Forest-based meta-learner classifier with the combined model to generate the AI tool for distinguishing between two medical conditions. In some embodiments, the large-scale hierarchical image database is not specific to images of the two medical conditions. In some embodiments, the image is obtained by a non-medical professional. In some embodiments, methods further comprise applying augmentation techniques to expand diversity of the images of the two medical conditions used to trail the AI tool. In some embodiments, the augmentation techniques comprise transformations of the images of the two medical conditions used to trail the AI tool. In some embodiments, transformations of the images comprise one or more of rotations, flips, scaling, and color adjustments. In some embodiments, the two medical conditions are skin conditions. In some embodiments, the two medical conditions can be differentiated visually. In some embodiments, the two medical conditions are (1) squamous cell carcinoma (SCC) lesion in a subject suffering from epidermolysis bullosa and (2) non-SCC lesion in a subject suffering from epidermolysis bullosa. In some embodiments, the subject suffers from recessive dystrophic epidermolysis bullosa (RDEB). In some embodiments, provided herein are methods of determining whether a subject suffers from a first or second medical condition, the method comprising: (a) obtaining an image of a region of the subject displaying a symptom of the medical condition; (b) providing the image to an AI tool developed by the method herein; and (c) determining whether the subject suffers from the first or second condition.
In some embodiments, provided herein are methods of developing an artificial intelligence (AI) tool for distinguishing between (1) a squamous cell carcinoma (SCC) lesion in a subject suffering from epidermolysis bullosa and (2) a non-SCC lesion in a subject suffering from epidermolysis bullosa: (a) training the AI tool on a large-scale hierarchical image database not limited to SCC lesions and/or subjects suffering from EB to generate an initially trained model with robust non-specific feature-detection capabilities; (b) adapting the initially trained model to the medical conditions by transfer learning to binary classification of images of SCC lesions and non-SCC lesions in subject suffering from EB to generate specific model trained on SCC lesions and non-SCC lesions in subject suffering from EB; (c) combining output of the specific model with patient clinical data to generate a combined model; and (d) training a Random Forest-based meta-learner classifier with the combined model to generate the AI tool for distinguishing between SCC lesions and non-SCC lesions in subject suffering from EB. In some embodiments, the subject suffers from recessive dystrophic epidermolysis bullosa (RDEB).
In some embodiments, provided herein are methods of distinguishing between an SCC lesion and a non-SCC lesion in subject suffering from EB, the method comprising: (a) obtaining an image of the lesion; (b) providing the image to an AI tool developed by the method herein; (c) determining whether the subject suffers from an SCC lesion or a non-SCC lesion. In some embodiments, methods further comprise treating the subject for an SCC lesion or a non-SCC lesion based on the outcome of step (c).
In some embodiments, provided herein are methods of determining whether a lesion on a subject suffering from epidermolysis bullosa (EB) is a squamous cell carcinoma (SCC) lesion or a non-SCC lesion, the method comprising: (a) obtaining an image of the lesion; (b) providing the image to an AI tool trained to distinguish between SCC lesions and non-SCC lesions on a subject suffering from EB; and (c) determining whether the lesion is an SCC lesion of a non-SCC lesion. In some embodiments, method further comprise administering appropriate treatment for an SCC lesion or non-SCC lesion.
In some embodiments, provided herein is software (e.g., an app) or a device (e.g., a computer, tablet, phone, etc.) containing a AI tool described herein. In some embodiments, provided herein is software (e.g., an app) or a device (e.g., a computer, tablet, phone, etc.) for performing a method herein.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. In case of conflict, the present document, including definitions, will control. Preferred methods and materials are described below, although methods and materials similar or equivalent to those described herein can be used in practice or testing of the present disclosure. All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “and” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.
For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.
“Correlated to” as used herein refers to compared to.
The terms “administration of” and “administering” a composition as used herein refers to providing a composition of the present disclosure to a subject in need of treatment (e.g., antiviral treatment). The compositions of the present disclosure may be administered by oral, parenteral (e.g., intramuscular, intraperitoneal, intravenous, ICV, intracisternal injection or infusion, subcutaneous injection, nebulization, or implant), by inhalation spray, nasal, vaginal, rectal, sublingual, or topical routes of administration and may be formulated, alone or together, in suitable dosage unit formulations containing conventional non-toxic pharmaceutically acceptable carriers, adjuvants and vehicles appropriate for each route of administration.
The term “artificial intelligence” as used herein refers to Computer performing tasks usually requiring human intelligence (ex. object recognition and complex decision-making).
The term “natural language processing” as used herein refers to processing text or spoken language to make information accessible to computer algorithms.
The term “machine learning” as used herein refers to computers learning from data provided (ex. data provided with human-labelled training data and/or with detectable labels) to gather insights and make predictions (ex. an algorithm clusters data according to its intrinsic features).
As used herein, the term “transfer learning” refer to a machine learning techniques that use pre-trained models to improve performance on a related task. In transfer learning, a model is pre-trained on one task and then fine-tuned for a related task. The model's components are reused to create a new model for the related task.
As used herein the term “random forest analysis” refers to a computational method based using multiple different decision trees to compute the overall most predicted class (the mode). In a specific application herein, the mode will be either RDEB-SCC or RDEB-non-SCC. The class predicted by the majority is selected as the predicted class for the sample.
The term “convoluted neural networks” for images and deep learning as used herein refers to how the nervous system processes information with multiple layers of interconnecting nodes where each performs calculations based on input from other connected nodes (ex. if multiple hidden layers, then deep neural networks).
The term “machine learning stages” as used herein refers to (a) preprocessing (ex., find missing values, resolve inconsistencies, restructure data), (b) splitting data into a training set (ex., generating an algorithm), repeating to improve performance, and using a smaller test set; and (c) choosing ML algorithm(s) based on the problem and data (usually multiple), choosing the best performance model, and validating on test set of unseen data.
As used herein, the term “subject” and “patient” as used herein interchangeably refers to any vertebrate, including, but not limited to, a mammal (e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea pig, cat, dog, rat, and mouse, a non-human primate (e.g., a monkey, such as a cynomolgus or rhesus monkey, chimpanzee, macaque, etc.) and a human). In some embodiments, the subject may be a human or a non-human. In one embodiment, the subject is a human. The subject or patient may be undergoing various forms of treatment.
As used herein, the term “treat,” “treating” or “treatment” are each used interchangeably herein to describe reversing, alleviating, or inhibiting the progress of a disease and/or injury, or one or more symptoms of such disease, to which such term applies. Depending on the condition of the subject, the term also refers to preventing a disease, and includes preventing the onset of a disease, or preventing the symptoms associated with a disease (e.g., viral infection). A treatment may be either performed in an acute or chronic way. The term also refers to reducing the severity of a disease or symptoms associated with such disease prior to affliction with the disease. Such prevention or reduction of the severity of a disease prior to affliction refers to administration of a treatment to a subject that is not at the time of administration afflicted with the disease. “Preventing” also refers to preventing the recurrence of a disease or of one or more symptoms associated with such disease.
Unless otherwise defined herein, scientific and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. For example, any nomenclatures used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those that are well known and commonly used in the art. The meaning and scope of the terms should be clear; in the event, however of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
The present disclosure provides methods and systems related to artificial intelligence for early recognition of rare skin diseases. In particular, the present disclosure provides methods and systems for early recognition of squamous cell carcinoma in epidermolysis bullosa and methods and systems for treating and/or preventing SCCs.
Epidermolysis bullosa is a group of rare genodermatoses affecting approximately 500,000 individuals globally, characterized by skin fragility and blistering. Recessive dystrophic epidermolysis bullosa (RDEB) is a severe subtype often complicated by squamous cell carcinoma (SCC), the leading cause of death in these patients. RDEB-SCC diagnosis is challenging, as symptoms often resemble other RDEB-related skin changes. Additionally, patient reluctance for full-body examination and biopsies complicates timely and accurate diagnosis. The aggressive nature of RDEB-SCCs with local destruction and metastases severely limits effective treatment options, emphasizing the critical need for early detection.
Current diagnostic challenges are compounded by the scarcity of dermatologists specialized in genodermatoses like RDEB, posing significant hurdles for routine and specialized care. Residual networks have shown proficiency in identifying skin cancers, such as melanoma, at a level comparable to expert dermatologists. Earlier AI developments have focused on dermoscopic images of “normal” skin, but applying these to macroscopic images of complex, rare diseases like RDEB is unprecedented. Adapting AI for RDEB has the potential to aid physicians in early cancer detection, streamlining the care pathway. By democratizing access to expert-level diagnostic capabilities, AI also addresses health inequalities arising from geographic and resource disparities.
Rare diseases such as RDEB present unique challenges in healthcare, primarily due to the scarcity of expertise and resources needed to diagnose and treat these conditions effectively. With less than 12% of machine learning studies on rare diseases evaluating their algorithms externally or against human expertise, there is a pressing need to improve the access and quality of diagnostics worldwide. AI applications for rare diseases often fail to transition from theoretical models to clinical practice. This gap is primarily due to the lack of external validation in many diagnostic tools designed for rare diseases, a challenge compounded by the difficulty in obtaining and sharing rare disease datasets without breaching patient privacy and involving patient advocacy groups. By conducting external validation using a separate test set and comparing AI performance against RDEB clinical experts, the AI tools herein overcome limitations that previous tools have faced.
Transfer learning has emerged as a key strategy in dealing with the small and imbalanced datasets typical of rare diseases. It mitigates bias and overfitting, ensuring the AI models are robust and generalizable. The choice of leveraging transfer learning contributes to ongoing advancements in improving diagnostic accuracy and availability for patients with rare diseases through innovative technologies. In some embodiments, the AI-based RDEB-SCC diagnostic tool developed herein currently serves Fitzpatrick's skin types I-III within specific clinical photography settings. Embodiments within the scope herein include a wider variety of skin types and amateur-obtained images.
In certain embodiments, the models developed for diagnostic purposes herein utilize both complete images (
In some embodiments, provided herein are AI meta-learners, leveraging Random Forest and Residual Network algorithms, that matches the diagnostic accuracy of international specialists in detecting SCC lesions in RDEB patients. The tools herein combine clinical images and patient data, enhancing healthcare delivery and broadening access to specialized oncology and rare disease expertise.
Embodiments of present disclosure provide methods and systems related to artificial intelligence for early recognition of rare skin diseases. In particular, the present disclosure provides methods and systems for early recognition of squamous cell carcinoma in epidermolysis bullosa and methods and systems for treating and/or preventing SCCs. Section headings as used in this section and the entire disclosure herein are merely for organizational purposes and are not intended to be limiting.
Embodiments of the present disclosure include a computer-based method comprising a) imaging a processed skin (ex., animal skin, human skin (ex. diseased human skin (ex. atopic dermatitis, cold sores, dry skin, psoriasis, vitiligo, contact dermatitis, rosacea, melasma, pemphigus, acne, poison ivy, poison oak, epidermolysis bullosa (EB), squamous cell carcinoma))) to provide an imaged human skin (ex., by whole slide imaging with a scanner or camera); b) annotating a one or more lesion of the imaged human skin (ex., manually or automatically annotating using labels and classification); c) assessing a potential for a disease of the imaged human skin (ex., rare skin diseases (ex., RDEB SCCs, Argyria, Erythropoietic Protoporphyria, Harlequin Ichthyosis, Elastoderma, Interstitial Granulomatous Dermatitis, Pemphigus, Acral Peeling Skin Syndrom)); and d) choosing a therapy for the imaged human skin.
In some embodiments, the method further comprises a application (ex., a computer application configured to a phone, a mobile device, a small computer, or a large computer).
In some embodiments, the assessing a potential for a disease of the imaged human skin comprises collection of electronic health data (ex., patient identifiers, operative notes, demographics, diagnoses, medications, procedures, laboratory data, vital signs, utilization of medical care, cost of medical care, digital health records (ex. all data hosted on a physician's own servers, storage of data from the physician to a third party, remote systems).
Embodiments of the present disclosure also include a method of treating a subject who is suffering from skin condition, the method comprising providing the subject with the computer-based system comprising a) a one or more imaging device; b) a one or more communication component) a one or more memory component; d) a one or more information processing component configured to analyze data; e) a protocol component; wherein the computer-based system assesses a potential for disease of the imaged human skin and performs actions on the imaged human skin.
Embodiments of the present disclosure also include a computer-based system comprising a) a one or more imaging device; b) a one or more communication component) a one or more memory component; d) a one or more information processing component configured to analyze data; e) a protocol component; wherein the computer-based system (i) assesses a potential for a disease of the imaged human skin; (ii) chooses a therapy for the imaged human skin; and (iii) performs a one or more action on the imaged human skin.
In some embodiments, the one or more communication component comprises an input device and an output device.
In some embodiments, the protocol component comprises one or more protocols for: (i) imaging a processed human skin to provide an imaged human skin; (ii) annotating a one or more lesion of the imaged human skin; (iii) assessing a potential for malignancy of the imaged human skin; and (iv) choosing a therapy for the imaged human skin.
In some embodiments, the one or more information processing component or the protocol component alters protocols based on responses from a software.
In some embodiments, the assessing a potential for a disease of the imaged human skin comprises collection of electronic health data.
In some embodiments, method herein comprise treating an SCC lesion in a subject suffering from RDEB (an RDEB-SCC lesion). Method of treating an SCC lesion include surgical removal of the lesion (e.g., Mohs surgery, shave excision, standard excision, etc.), radiation therapy, electrodesiccation and curettage, topical chemotherapy treatment (e.g., fluorouracil), etc.
It will be readily apparent to those skilled in the art that other suitable modifications and adaptations of the methods of the present disclosure described herein are readily applicable and appreciable, and may be made using suitable equivalents without departing from the scope of the present disclosure or the aspects and embodiments disclosed herein. Having now described the present disclosure in detail, the same will be more clearly understood by reference to the following examples, which are merely intended only to illustrate some aspects and embodiments of the disclosure, and should not be viewed as limiting to the scope of the disclosure. The disclosures of all journal references, U.S. patents, and publications referred to herein are hereby incorporated by reference in their entireties.
The present disclosure has multiple aspects, illustrated by the following non-limiting examples.
Using artificial intelligence (AI) to train a computer to recognize SCC on EB skin. Several international reference centers were joined to amass a large number of photographs of RDEB SCCs and RDEB skin. The photographs are used to train an AI tool to take and analyze photographs of RDEB skin and to identify RDEB-SCCs.
The AI tool is capable of use at clinical centers and at home and will aid clinicians in deciding whether to take a biopsy and which skin area to biopsy. The AI tool is also useful for patients and families between clinic visits, enabling them to capture cutaneous images for review at the visit without going through a time-consuming and painful full bandage change in the doctor's office. Image exchange between patients and their physician is valuable for patients with difficult access to medical expertise and those who are hesitant to allow full skin examination at each visit, often because of pain or fear of biopsies.
Sizable EB populations are merged to amass a large cohort of patients and their images. A joint, anonymized data organization system (using REDCap and SharePoint) is created between the cooperating centers to establish a strong data pipeline. The globally collected sets of photographs of i) RDEB-SCCs and ii) other skin findings in RDEB (e.g., wounds and hyperkeratosis) are used to create a development set and a validation set. A total of i)>500 RDEB-SCC pictures and ii)>1500 pictures of other RDEB skin findings is compiled.
The image classification model is also trained by transfer learning, using images of head and neck cutaneous SCCs in non-EB patients. These share certain molecular characteristics of RDEB-SCC and, although different in course, location, and age of affected individuals, are similar enough in appearance that the classifier can be trained first on that to identify likely SCC and then be applied to RDEB-SCCs. A large image set of >3000 head and neck cutaneous SCCs is compiled from general dermatology, skin surgery, and surgical oncology units. To further enhance the model performance, demographic and clinical features, in addition to image data, is included.
A deep learning network is trained on the development set to recognize RDEB-SCCs and to distinguish them from other RDEB skin findings. Model development and refinement is performed in cooperation with a multi-disciplinary team of informatics experts and computer science individuals. Initial models are developed using convolutional neural networks (CNNs).
To validate the model and to assess its accuracy, the performance of the deep learning network is compared against that of a group of international EB experts. These experts are recruited and include at least 30 individuals. Experts are asked to indicate their experience with EB in years (0-2 years, 2-5 years, >5 years, >10 years). A validation set consisting of 50 images not used in the training phase is created. Participants and the deep learning network are asked for a dichotomous diagnosis (“RDEB-SCC” or “other RDEB skin finding”) on the images of the validation set. Analysis employs descriptive statistics and two-sided t-tests assesses the difference between the performances of the experts and the AI application. The significance level is set at P<0.05.
A smartphone application (app) is created that incorporates the deep learning network intended for use both by physicians at clinical centers as well as by individuals affected by RDEB and their families at home. For the latter purpose, families with teenagers and adults with RDEB are contacted via the patient organization DEBRA International (at least 10 patients/families) to test the app and provide feedback on ease of use, information exchange, and quality of images.
In patients with recessive dystrophic epidermolysis bullosa (RDEB), distinguishing squamous cell carcinoma (SCC) lesions from other skin changes is challenging even for experienced dermatologists, yet early detection improves survival outcomes. Experiments were conducted during development of embodiments herein to develop and validate an artificial intelligence (AI) tool to effectively detect carcinoma in patients with RDEB using images and clinical data.
A retrospective diagnostic study analyzed routinely collected patient data from 11 international dermatology sites, using deep learning models trained, fine-tuned, and cross-validated on 741 biopsy-confirmed macroscopic photographs of RDEB-associated SCCs and non-SCCs. These models, enhanced with patient clinical data through Random Forest models, were evaluated against extra 181 biopsy-confirmed images. The AI's diagnostic accuracy was compared to both individual and consensus assessments from 21 international EB-expert physicians across five datasets, each containing 30-49 images.
Primary outcomes measured diagnostic accuracy of meta-learner models, individual clinicians, and dermatologist consensus through the specificity, sensitivity, and area under the curve (AUC) compared to biopsy-confirmed diagnoses. Secondary outcomes assessed diagnostic agreement among 21 international dermatologists and AI models, using Fleiss Kappa for clinicians and Cohen's Kappa for AI versus clinician consensus. Another secondary outcome assessed the diagnostic accuracy of meta-learner models trained on images with backgrounds removed.
Clinician diagnostic agreement was fair-to-moderate (Fleiss Kappa=0.33±0.12), with AUC values ranging from 0.57 to 0.87; the majority dermatologist vote produced an average AUC of 0.71±0.04, comparable to AI performance. Meta-learner models based on ResNet152 and ResNet50 showed AUCs of 0.77±0.14 and 0.71±0.06 on smaller clinician-evaluated datasets, and 0.76 and 0.81 on an expanded set of 181 images, respectively. Models designed for images with the background removed showed comparable diagnostic accuracy when tested on a dataset of 181 images. The exemplary AI accurately distinguished carcinoma from other lesions in RDEB skin images, matching dermatologist consensus and outperforming individual clinicians. This tool enhances diagnostic precision and accessibility, enables expert-level detection in remote areas, and streamlines continuous cancer monitoring.
A total of 150 patients diagnosed with RDEB were included in the diagnostic model development. Eligibility criteria required participants to be aged 11 years and older, have Fitzpatrick skin types I-III (pale white to darker white skin), and biopsy- or expert-confirmed SCC presence or absence (
Before analysis, all clinical data and biopsy-confirmed lesion photographs (n=1078) were fully anonymized. 156 images deemed of poor quality by at least two dermatologists, hindering accurate diagnostic assessments, were excluded from the dataset (
For all models, images were converted from DICOM to JPG format and standardized to a resolution of 224×224 pixels for consistency. The initial model development and clinician evaluation used JPG images with original backgrounds. A subset of models was additionally trained on images without background; for this, background elements were removed using the Segment Anything Model with subsequent manual validation.
Patient characteristics with significant missing data were excluded, and for those retained, missingness indicators were used. This method avoids imputation and maintains data integrity, as missing values, like lesion location in conditions unrelated to webbing, are inherently informative and not random. Patient information was categorized into binary variables based on age groups, degree of webbing, lesion duration, history of cutaneous SCC, and lesion locations (Table 1).
ResNet50 and ResNet152 from the Residual Network family were used as base architectures for detecting SCC from images. These models were initially trained on a large dataset featuring over 14 million images across more than 20,000 classes, which equipped them with robust feature-detection capabilities. Utilizing transfer learning, the team adapted these models for SCC identification in RDEB lesional skin by replacing the original top layer with a new one designed for binary classification into RDEB-SCC and RDEB-non-SCC, incorporating dropout and a dense output layer to enhance generalization and prevent overfitting. To further improve robustness, data augmentation techniques such as random flipping, zooming, and rotation were applied.
The initial layers of the models were frozen to maintain the integrity of the learned features, focusing training on fine-tuning the new layers over 30 epochs using a batch size of 16 images and a learning rate of 0.001, with one-fifth of the data reserved for validation. The Adam optimizer was used for efficient training, and model checkpoints were saved based on the highest validation F1 scores. The training leveraged 5-fold cross-validation, producing five well-performing models. Each model generated an output labeled either as RDEB-SCC or RDEB-non-SCC. Probabilities from these models were averaged to create an ensemble, enhancing prediction reliability. This ensemble output, combined with patient clinical data, was used to train a Random Forest-based meta-learner classifier. This hybrid approach integrates the robust feature extraction of residual models with the ensemble decision-making of Random Forests, optimizing the model's diagnostic performance.
During the meta-learner training, the optimal tree count for the Random Forest was identified by evaluating out-of-bag accuracy over various counts. Recursive Feature Elimination was then used to iteratively refine the feature set (Table 6). The data sets exhibited class imbalances, with ratios of RDEB-SCC to RDEB-non-SCC of 1.63 and 2.42 in the training and testing sets, respectively. To mitigate bias towards the majority class and enhance fairness, the model automatically adjusted the weights inversely to the class frequencies. Further model optimization involved a comprehensive grid search across multiple hyperparameters to fine-tune the settings for improved prediction accuracy. The best parameters were determined through 5-fold cross-validation, ensuring model performance across varied data subsets. The final meta-learner model, trained with optimized parameters, reliably demonstrated its effectiveness through key performance metrics on the withheld test set. The final outputs of the meta-learner models include predicted probabilities of the positive class and their confidence intervals, calculated assuming a normal distribution. For each prediction, the standard deviation is computed from the probability's inherent variance. A Z-score for the 95% confidence level determines the margin of error, which defines the lower and upper bounds of the confidence interval around each predicted probability.
The Mean of Absolute Shapley Additive Explanations (SHAP) Values quantifies the average impact of clinical features on the predictions of a Random Forest model, focusing solely on the magnitude of influence. SHAP values were initially computed for each instance and class (SCC/non-SCC) to assess deviations from the baseline. These values (
One feature in the Random Forest-based meta-learner is derived from images processed through ResNet50 or ResNet152 models. To identify the specific regions of the image that the model analyzes to detect squamous cell carcinoma (SCC), class activation mapping (CAM) techniques were employed (
21 clinicians from EB-Clinet and the North American EB Clinical Research Consortium were consented to assess subsets A to E of images using an online REDCap survey. Similar to AI, they received relevant clinical information (Table 4) for each image. Clinicians evaluated the likelihood of SCC in each image. Among the 21 clinicians who evaluated the images, 11 were dermatologists, 8 specialized in pediatric dermatology, and 2 provided healthcare to RDEB patients without a dermatology specialty. Of these clinicians, 66.67% had over 20 years of experience with RDEB, 28.57% had at least 10 years, and one clinician had a minimum of 5 years' experience. Their exposure to SCC varied, with some having treated fewer than 4 cases and others handling over 20 cases of RDEB-SCC. These clinicians practice in 10 different countries with diverse dermatological practices, including Australia, the USA, Canada, the UK, Chile, India, Germany, Switzerland, Austria, and Mexico, reflecting a global spread averaging 4532.595±885.1522 km from the equator.
Primary outcomes measured diagnostic accuracy of meta-learner models, individual clinicians, and dermatologist consensus through sensitivity, specificity, and the area under the curve (AUC) compared to biopsy-confirmed diagnoses. Secondary outcomes assessed diagnostic agreement among clinicians and AI models, using Fleiss Kappa for clinicians and Cohen's Kappa for AI versus clinician consensus. Another secondary outcome assessed the diagnostic accuracy of meta-learner models trained on images with backgrounds removed. All analyses were performed using Python, version 3.9.
Clinical data and images from patients with RDEB, collected over a specified period, were used to develop an AI diagnostic tool to detect SCC. The data were gathered during routine dermatological care before biopsy. The data was organized into a training set of 741 images and a test set of 181 images (Table 4). The training set had a higher female representation (41.97%) compared to the test set (28.73%), a less right-skewed age distribution, and a lower SCC to non-SCC ratio of 1.63, against 2.42 in the test set. Both sets primarily contained lesions observed for at least six weeks, affecting 52.1% of the training and 58.01% of the test datasets. Documentation on the onset of these lesions was lacking for 36.03% and 37.02% of images, respectively. This significant level of missing data is expected due to the chronic and recurrent nature of RDEB, which often hinders consistent record-keeping. In each set, lesions primarily appeared on the hands (32.8% in the training & 40.33% in the test set) and feet (20.38% in the training & 13.81% in the test set), with the remainder evenly distributed across other body areas. Moreover, a majority of patients in both sets exhibited severe digital deformity (complete webbing), with rates of 79.22% in the training and 83.43% in the test sets. A significant proportion of individuals in both the training (62.35%) and test (74.59%) sets reported a history of cutaneous tumors.
The SCC to non-SCC ratios (1.4±0.09) in sub-sets A to E of the test set (
Each clinician was tasked with evaluating images and clinical data from one of subsets A to E. Diagnostic agreement among clinicians, exceeding chance levels as illustrated in Table 2, varied from fair (Fleiss' kappa19=0.29±0.09 in subsets A, C, D) to moderate (Fleiss' kappa=0.53±0.03 in subsets B and E). Comparative analysis of clinician diagnoses against biopsy results showed a sensitivity of 0.63±0.06, specificity of 0.72±0.2, and an AUC of 0.73±0.08. Clinician performance varied within each sub-set, with individual AUC scores ranging from 0.57 to 0.87. No significant correlations were detected between clinicians' diagnostic accuracy and their specialty or experience with RDEB or SCC, as detailed in Table 5.
Meta-learner models were evaluated on a test set comprising 181 images. The ResNet50-based meta-learner demonstrated superior performance with a sensitivity of 0.93, specificity of 0.70, and an AUC of 0.81. In contrast, the ResNet152-based model achieved a sensitivity of 0.81, specificity of 0.72, and an AUC of 0.76. Among the features, the history of cutaneous tumors (feature importance=0.23) and the output from image processing by ResNet50 (importance=0.13) were the most influential in the decision-making process, both showing a positive correlation with the diagnoses of RDEB-SCC, as indicated by the mean absolute SHAP values (Table 6,
Two additional meta-learner models were trained on the same dataset without background elements. Both models showed similar diagnostic accuracy (
To directly compare clinician and AI diagnostic performances, AI meta-learners were evaluated using subsets A to E, previously assessed by clinicians. The earlier-reported moderate agreement among clinicians led us to use a majority vote approach to derive a consensus diagnosis based on the most common decision for each image. This majority vote was then benchmarked against the performances of meta-learner models (
The dataset comprised multiple sets of images, with each set depicting the same lesion from different viewpoints or similar lesions on the same individual. To maintain data integrity, it was manually ensured that the sets were not divided between the training and test sets. Notably, there was no overlap between the images in the test and the training sets in terms of identical lesions with multiple viewpoints. A total of 924 professional images, comprising 587 EB SCC and 361 EB non-SCC, were allocated into distinct datasets for model training and validation. Specifically, the dataset was divided into a training set of 745 images (459 EB SCC and 282 EB non-SCC) and a test set of 181 images (128 EB SCC and 53 EB non-SCC), representing approximately 80% and 20% of the total dataset, respectively.
A diverse array of models were evaluated, encompassing both convolutional neural networks (CNNs) and a traditional machine learning algorithm (Random Forest), to address our research objectives. The CNN architectures examined include: (i) ResNet50, ResNet101, and ResNet152, which are part of the Residual Networks family known for their deep architectures facilitated by skip connections that mitigate the vanishing gradient problem; (ii) Xception, which leverages depth wise separable convolutions to enhance model efficiency and performance; (iii) VGG16, which is characterized by its simplicity and depth, employing consecutive 3×3 convolutional layers; and (iv) DenseNet121, which is distinct for its dense connectivity pattern, ensuring maximum information flow between layers in the network. Beyond CNNs, we explored the Vision Transformer (ViT), which applies the transformer architecture to image analysis, capturing global dependencies within the image. Lastly, the Random Forest model, a traditional machine learning algorithm, uses an ensemble of decision trees to reduce overfitting and improve prediction accuracy. Each model was selected for its unique architectural features or learning strategy, offering a comprehensive overview of current methodologies in image classification and analysis.
To enhance the robustness and generalizability of the model, a 5-fold cross-validation method was employed. This approach involved partitioning the training dataset into five equal-sized subsets. In each cross-validation cycle, four subsets were used for training the model, while the remaining subset served as the validation set to assess model performance. This process was repeated five times, with each subset used exactly once as the validation set, ensuring comprehensive evaluation and utilization of the data for model optimization.
Data augmentation was applied in all model training processes, with the exception of the RandomForest model. Data augmentation, a technique to artificially expand the diversity of a dataset by applying various transformations to the original images, can significantly impact the performance of deep learning models such as ResNet50, ResNet101, ResNet152, ViT, VGG16, and DenseNet121. By introducing variations such as rotations, flips, scaling, and color adjustments, data augmentation can enhance the models' ability to generalize from the training data to new, unseen data, potentially improving accuracy and robustness against overfitting. For CNNs like ResNet variants and VGG16, data augmentation exploits spatial hierarchies and invariances, making these models more adept at recognizing patterns and features under varied conditions. Similarly, for architectures like ViT that rely on global context and attention mechanisms, data augmentation can provide a richer set of contextual scenarios, promoting better feature extraction and classification performance. DenseNet121, known for its efficient feature reuse, can also benefit from augmented datasets by learning more diverse representations. However, the effectiveness of data augmentation is contingent on the relevance and diversity of the transformations applied, necessitating careful selection to match the specific challenges and characteristics of the task at hand.
The optimal model was determined through a comprehensive evaluation of its performance, utilizing a suite of statistical measures to ensure a balanced assessment across various aspects of predictive accuracy and reliability. Specifically, the model's effectiveness was gauged using Accuracy, which measures the overall correctness of predictions; Precision and Positive Predictive Value (PPV), which assess the model's ability to accurately predict positive instances; Recall/Sensitivity, indicating the model's capacity to identify all actual positive cases; F1 score, a harmonic mean of Precision and Recall, providing a single metric to balance the trade-off between them; Specificity, evaluating the model's ability to correctly identify negative instances; and Negative Predictive Value (NPV), which measures the accuracy of predicting negative outcomes. This multifaceted approach ensured that the selected model not only excels in identifying true positives and negatives but also maintains a low rate of false positives and negatives, embodying a robust predictive tool suitable for practical application.
Among the fine-tuned convolutional neural network (CNN) models, ResNet50 and ResNet152, alongside VGG16 and DenseNet121, demonstrated (Table 7) superior performance, achieving F1 scores of 0.84, 0.84, 0.77, and 0.71, respectively. Further enhancement was observed when patient characteristics were integrated with the models. Meta-learner models based on ResNet50 and VGG16 emerged as the most effective, with F1 scores of 0.90 and 0.87, correspondingly.
Feature importance serves as a crucial metric for evaluating the contribution of each variable to the model's overall prediction accuracy. This ensemble method, comprising multiple decision trees, assesses feature importance by averaging the decrease in impurity—such as Gini impurity for classification tasks or variance for regression—attributed to each feature across all trees. Generally, a feature that significantly reduces impurity is deemed more critical for the model's performance. While most features exhibited importance scores around 0.05 (
The following references are herein incorporated by reference in their entirety.
This application claims the benefit of U.S. Provisional Patent Application No. 63/593,845, filed on Oct. 27, 2023, which is incorporated by reference herein.
| Number | Date | Country | |
|---|---|---|---|
| 63593845 | Oct 2023 | US |