The present invention relates to method and systems for detecting disease and more specifically to detection of disease from a voice sample using machine learning.
Given the restrictions on mobility due to the global lock-down scenario resulting from the COVID epidemic, face-to-face medical consultations are difficult. However, the health industry continues to evolve and is adopting telemedicine to facilitate the accessibility of health services. Telemedicine is a blend of information and communication technologies with medical science. But telemedicine is limited by the apparent lack of physical examination, which in turn may increase the number of incorrect diagnoses. Therefore, a physical examination seems to be mandatory process for proper diagnosis in many situations. For example, every doctor has a stethoscope, but how many people own a personal stethoscope? Digital stethoscopes currently on the market usually do not pay off on a personal level, even in developed countries.
There is provided a solution that combines any standard stethoscope with a microphone having sufficient bandwidth for recording the sound of the heart and/or the respiratory system, for example via a smartphone. A designed application that identifies the device being used records the auscultation for signs of specific symptoms and introduce these voices to an AI/machine learning algorithm. This project is currently in conjunction with potential symptoms of upper respiratory tract infection, chronic obstructive pulmonary disease, or pneumonia, as these are the most common symptoms associated with COVID-19.
The same AI model can be deployed on different devices and the core BBV process remains the same. Indeed, the BBV application can run on a personal device, such as a microcontroller, a mobile phone, or even on a personal computer (PC).
According to the present invention there is provided a method for detecting infection from a voice sample, the method including: (a) generating machine learning (ML) training data, including: (i) collecting raw data from a plurality of specimens, for each specimen: capturing an audio recording of internal sounds of the specimen inhaling and exhaling, capturing an audio recording of external sounds of the specimen inhaling and exhaling, and receiving medical data, such that the training data includes: (A) an internal dataset of a plurality of the audio recordings of internal sounds of a plurality of specimens inhaling and exhaling, (B) an external dataset of a plurality of the audio recordings of external sounds of the plurality of specimens inhaling and exhaling, and (C) a medical dataset of medical information related to each of the specimens; (ii) processing the internal and external datasets to generate processed data and metrics for each of the internal and external datasets; (iii) correlating between the internal dataset, the external dataset and the medical dataset; (b) training a ML model based on the training data; (c) classifying a newly received audio recording of external sounds of a user, using the ML model; and (d) outputting a metric determining a health status of the user.
According to further features the audio recording of internal sounds and the audio recording of external sounds are synchronized. According to further features the audio recording of internal sounds and the audio recording of external sounds are unsynchronized. According to further features each audio recording of internal sounds is captured by a specialized recording device approximating auscultation of a thorax. According to further features each audio recording of internal sounds is captured by pressing an audio recorder against a thorax of the specimen. According to further features each audio recording of external sounds is captured by a commercial recording device. According to further features each audio recording of external sounds is captured by a recording device held away from a face of the specimen.
According to further features the specimen inhaling and exhaling is achieved by the specimen performing at least one action selected from the group including: coughing, counting, reciting a given sequence of words.
According to further features the processing includes: bandpass filtering of raw data of the internal dataset and the external dataset to produce a bandpass filtered data set.
According to further features processing includes: detecting a rhythm in each of the plurality of audio recordings of external sounds or the plurality of audio recordings of the internal sounds. According to further features the rhythm is compared to a reference rhythm having an associated reference tempo, and a data set tempo generated for the external dataset or the internal dataset, the data set tempo being in reference to the associated reference tempo. According to further features the method further includes: adjusting the data set tempo to match the reference tempo, thereby producing a prepared data set and a corresponding tempo adjustment metric.
According to further features the method further includes: detecting and removing spoken portions of the prepared data set to produce a voice-interims data set.
Various embodiments are herein described, by way of example only, with reference to the accompanying drawings, wherein:
The principles and operation of learning model and diagnostic methodology according to the present invention may be better understood with reference to the drawings and the accompanying description.
It is noted that throughout this document the terms Artificial Intelligence (AI), Machine Learning (ML), Neural Network (NN), Deep Learning and similar terms are used interchangeably and only for the purpose of example. The term Machine Learning (ML) will be used herein as a catchall phrase to indicate any type of process, algorithm, system and/or methodology that pertains to machine learning, such as, but not limited to, AL, ML, NN, Deep Learning and the like.
Audio Processing is an integral part of the instant system. The instant systems and methods deal with biomedical signals. Accordingly, it is necessary to ensure that only the data of interest is examined from the signal and everything else is filtered out.
In subjects with healthy lungs, the frequency range of the vesicular breathing sounds extends to 1,000 Hz, and the majority of power within this range is found between 60 Hz and 600 Hz. Other sounds, such as wheezing or stridor, can sometimes appear at frequencies above 2,000 Hz. In the range of lower frequencies (<100 Hz), heart and muscle sounds overlap. This range of lower frequencies (<100 Hz) is preferably filtered out for the assessment of lung sounds.
Hence, to reduce the influence of heart and muscle sounds, as well as noise, and to prevent aliasing (misidentifying a signal frequency, introducing distortion or error), all sound signals are band pass-filtered, using a band pass of 100 Hz to 2,100 Hz.
An exemplary implementation of using the instant system is now described. Implementations include methods for determining a state of a person's body, in particular the person's health status, detection of disease, illness, and/or condition of the body.
1. A first dataset is provided, typically as raw data of audio recordings from the person's chest area (thorax) while the person is inhaling and exhaling (speaking, coughing, breathing). The term “person”, as used herein to denote the individual from whom the raw audio data is attained, may also be referred by the terms “subject”, “specimen”, “participant”, “sample source”, variations thereof and similar phrases. These terms or phrases are used interchangeably herein.
The first dataset is referred to in this document as the “true”, “actual”, “inside” or “internal” data. For best results, the first dataset (also referred to herein as “internal dataset”) should be recorded as accurately as possible, using a high-quality device. One such device is a digital stethoscope. As is known in the field, “auscultation” is the medical term for using a stethoscope to listen to the sounds inside of a body. For the current method, auscultation of the lungs is preferred. In examples, each audio recording of internal sounds is captured by a specialized recording device approximating auscultation of a thorax. In examples, each audio recording of internal sounds is captured by pressing an audio recorder against a thorax of the specimen
2. A second dataset is provided, typically as raw data of a person inhaling and exhaling, similar to the first dataset. The second dataset (also referred to herein as the external dataset) is referred to in the context of this document as “environmental”, “measured”, “outside”, or “external” data, and is provided using a commercial microphone (such as a built-in smartphone or personal computer (PC) microphone). Each audio recording of external sounds is captured by a recording device held away from the subject's face. Preferably the audio in the two datasets (first and second datasets) are the same, for example, the person coughing, counting, or speaking a given sentence or reciting a given sequence of words. Preferably, the first and second datasets are captured at the same time, for best correlation, for example, microphones synchronized in time. However, this is not limiting, and the recordings of the first and second datasets can be unsynchronized recordings of the same audio (same reading or noise like coughing).
3. A medical dataset of medical information related to each person for whom internal and external recordings are provided. The medical information may include medical diagnosis of the health status of the person. For example, whether the person is healthy or ill, what diseases (if any) the person has, and/or what is the state of the person's health. Diseases include, but are not limited to: Covid-19, flu, cold, Chronic Obstructive Pulmonary Disease (COPD), Pneumonia, and cancer. Health status may also include pregnancy and alcohol consumption. In the context of this document, for simplicity of description, the terms “medical information”, “disease” and “health status” may be used interchangeably for diseases, health status, and other status (such as gender).
4. For each of the first and second (internal and external) datasets various layers of processing can be done to generate a variety of processed data and metrics. Some exemplary layers of processing include the following:
5. Once pre-processing has been completed, a variety of processed data and metrics has been generated, for each data set, and corresponding between the first and second datasets. Next, correlating is done between the first and second datasets of the processed data and metrics to generate training data. A typical correlation includes, the prepared first data set, the first tempo adjustment metric, the first voice-interim data set, the prepared second data set, the second tempo adjustment metric, the second voice-interim data set, and the health status/medical information. The training data preferably includes the results of the correlating step, as well as the data used to do the correlation. The training data may include other data, whether or not used to do the correlation.
The output metrics from the classifier may be post-processed as appropriate to generate more meaningful, or alternative representations of the person's health status.
For completeness, a run-through of using an example implementation of the system in a software application is detailed hereafter.
If too few cycles are run, the network may not manage to learn everything it can from the training data. However, if too many cycles are run, the network may start to memorize the training data and will no longer perform well on data it has not seen before. This is called overfitting. As such, the aim is to get maximum accuracy by tweaking the parameters.
The ‘minimum confidence rating’ refers to the threshold at or below which a sample will be disregarded. For example, a setting of 0.8 means that when the neural network makes a prediction, and there is a 0.8 probability that some audio contains a noise, the machine learning (ML) algorithm will disregard it, unless it is above the threshold of 0.8.
People without symptoms infected with Covid-19, or other diseases show, by definition, no noticeable physical symptoms of the disease. Thus, they are less likely to perform for virus tests and can spread the infection, without their knowledge. But asymptomatic people are not completely free of changes caused by the virus. Testing has found that asymptomatic people sound different from healthy people. These differences cannot be deciphered to the human ear, but these differences can be collected by artificial intelligence. An AI model, according to the instant disclosure, as able to differentiate between asymptomatic and healthy people through forced cough recordings, which people transmit via Internet web browsers or dedicated applications, using devices such as PCs, laptops, tablets, cell phones and other devices.
Applicants trained the model on hundreds of cough samples as well as spoken words. When they fed the model in new cough recordings, the model accurately identified 98.5 percent of the coughs from people who were confirmed to have Covid 19, including 100 percent of the coughs from asymptomatic patients—who reported having no symptoms but tested positive for the virus.
A user-friendly mobile application (hereafter “testing app”) is provided as a non-invasive pre-screening tool to identify people who may be symptomatic or asymptomatic for Covid-19. For example, a user can log in daily, cough on their phone and get immediate information on whether s/he may be infected. When a positive result is received, the app user may be directed confirm with a formal examination, such as a PCR test. In some implementations, a formal test is not required due to the proven accuracy of the system.
A biomarker is a factor objectively measured and evaluated which represents a biological or pathogenic process, or a pharmacological response to a therapeutic intervention, which can be used as a surrogate marker of a clinical endpoint [19]. A vocal biomarker is a signature, a feature, or a combination of features from the audio signal of the voice and/or cough that is associated with a clinical outcome and can be used to monitor patients, diagnose a condition, or grade the severity or the stages of a disease or for drug development. It must have all the properties of a traditional biomarker, which are validated analytically, qualified using an evidentiary assessment, and utilized.
According to embodiments there is provided a Black Box Voice (BBV) App (also referred to as “testing app”) which is a software app which analyzes vocal biomarkers and uses Artificial Intelligence (AI) as a medical screening tool. The software application can detect COVID-19, using a combination of unique vocal samples and a trained AI model and provide a positive/negative indication within minutes of the sampling operation, all on a common mobile smartphone.
The testing app can distinguish symptomatic, as well as asymptomatic COVID-19 patients from healthy individuals. The coronavirus symptoms (even in asymptomatic patients) initially causes infection in the areas of the nasal cavity and throat and then infects the lungs. Therefore, the voice-affecting parts of the body are the nasal passages, the throat and the lungs. These changes can be detected at any stage of the COVID-19 infected patient. Based on these vocal changes, there is a distinct vocal biomarker consisting of a combination of features from the audio signal of the acquired voice and cough signals that is associated with a clinical outcome and can be used to diagnose COVID-19.
The interactive app instructs the patient to count to three and then cough three times. The smartphone microphone captures the voice and cough samples and converts the audio signals into “features”, meaning the most dominating and discriminating characteristics of the signal, which comprise the detection algorithm. These “features” include, prosodic features (e.g., energy), spectral characteristics (e.g., centroid, bandwidth, contrast, and roll-off), voice quality (e.g., zero crossing rate) as well as other and methods of analysis including Mel-Frequency Cepstrum Coefficient (MFCCs), Mel Spectrogram, etc.
The BBV App algorithm located at the backend, consisting of the selection of “features”, automatically classifies the incoming data according to the appropriate clinical outcome (i.e., positive or negative for COVID-19). The results are presented to the user on the smartphone screen within 60 seconds.
Once the app is installed, a test may be started according to the following steps, depicted in app screens in
Start the app by clicking the Start button on the screen in
Validation of the BBV App device software was performed according to the IEC 62304:2006/AMD1:2015 standard for Medical device and Software life-cycle processes standards, including usability engineering to medical devices. The software related documents were composed according to the specific IEEE standards and the FDA Guidance for the Content of Premarket Submissions for Software Contained in Medical Devices. The BBV App device software is finalized and frozen prior to pursuing the current clinical study. The following software validation documents are maintained on file at the company as part of the Design History File:
Following the above described algorithm development and validation, the algorithm parameters of the BBV App to detect COVID-19 were finalized. A pilot clinical study, consisting of 546 subjects, was performed to further validate the BBV App device algorithm in the clinical environment in which it is intended to be used. The pilot clinical study is described here.
A preliminary study was conducted to compare the BBV App device results obtained from voice recordings for the non-invasive detection of COVID-19, using invasive nasal swab specimens and PCR analysis findings as the gold, reference standard. Active enrollment took place from the March 2021 to July 2021 at Ashdod Rashbi Medical Center, Maccabi Health Care Services (Ashdod, Israel, Dr. Gil Siegal-PI) and Al Mazroui Medical Center (Dubai, United Arab Emirates, Dr. Vinash Kamal-PI). A total of 546 eligible subjects were enrolled in the study. The study population, who represent the target population for this procedure, consisted of healthy subjects and subjects with known or suspected COVID-19 disease, who were scheduled for invasive nasal swab tests, which were subsequently analyzed using the polymerase chain reaction (PCR) test method.
Subjects of both genders, >18 years of age were recruited to the study.
Nasal swab specimen acquisition was performed in a routine fashion in healthy subjects and in subjects with suspected COVID-19 disease. PCR testing of each nasal swab specimen was analyzed and served as the gold standard reference. The BBV Medical App results were not used for diagnostic or clinical decisions. The blinding status was maintained until the last subject completed the study at each site.
The dichotomous determination (positive or negative) of the BBV App device result per patient was compared to the PCR result for the same patient. The sensitivity and specificity of the BBV App device was calculated. Furthermore, the BBV App device accuracy, positive predictive value and negative predictive value were determined.
A total of 546 subjects were analyzed in the main analysis set. The data represents the general patient population undergoing testing for COVID-19 and in whom the BBV App device may potentially be used.
In 403/546 (73.8%) of the subjects, the PCR results confirmed a negative finding for COVID-19 and in 143/546 (26.2%) of the subjects the PCR results indicated a positive finding for COVID-19. The BBV App device indicated a negative finding in 406/546 (74.4%) of the subjects and indicated a positive finding in 140/546 (25.6%) of the subjects. Statistical analysis of the 546 study subjects presented results for the study primary endpoints of sensitivity and specificity of 97.9% and 100%, respectively exceeding the goal of the primary objective. The lower limit of the 95% confidence interval demonstrates the successful achievement of the primary objective goals, sensitivity and specificity as well, at 95.6% and 100% respectively. The Exact binomial P-value's (1-sided) were <0.001, respectively, deeming the results in 546 subjects statistically significant.
The first secondary endpoint presented 99.5% accuracy in correctly measuring a positive or negative result. The second and third secondary endpoints, Positive Predictive Value (PPV) (100%) and Negative Predictive Value (NPV) (99%), further demonstrate the successful achievement of the secondary objectives of the study.
Thus, the study primary and secondary endpoints have been demonstrated as successfully met and support the safety and efficacy of the BBV App device for its intended use of detecting COVID-19 using voice recordings.
The clinically and statistically significance results of the BBV App device, demonstrate an effective screening device effective for providing an accurate and clinically meaningful, COVID-19 result using non-invasive voice recordings. The use of the BBV App device as a screening tool is an effective means of detecting COVID-19 infection, with additional caveats for interpreting positive and negative results (as stated in the device description section below) and can assist the physician in quickly determining treatment options. The results of the above clinical study will be corroborated in the current clinical study, in which usability in the hands of potential end users will also be assessed.
Following the pilot validation clinical study consisting of 546 subjects, the stage 2 validation clinical study will be conducted and is described in the study protocol attached to this Helsinki Submission. The Stage 2 validation clinical study will be conducted according to the MOH—Department of Laboratories—Guidelines for Validation of Point of Care Testing (POCT) for detecting the SARS-CoV-2 Virus (18 Nov. 2020).
While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made. Therefore, the claimed invention as recited in the claims that follow is not limited to the embodiments described herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2022/050750 | 1/28/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63142522 | Jan 2021 | US |