The present invention concerns a method and automated apparatus for diagnosing maladies such as, though not limited to, Obstructive Sleep Apnea (OSA) from patient sounds.
The present application claims priority from Australian provisional patent application No. 2018903933 filed 17 Oct. 2018, the disclosure of which is hereby incorporated herein by reference.
Any references to methods, apparatus or documents of the prior art are not to be taken as constituting any evidence or admission that they formed, or form part of the common general knowledge.
One common malady is the sleep disorder of Obstructive Sleep Apnea syndrome (OSA). The prevalence of OSA in adults varies from 17-26% in males and 9-28% in females [2]. At present over 85% of OSA patients remain undiagnosed [3]. OSA is characterized by a repetitive upper airway collapse during sleep. Full closure of the upper airway is termed “apnea” and partial closure is termed “hypopnea”. The average number of apnea and hypopnea events per-hour of sleep is termed the Apnea-Hypopnea Index (AHI). AHI is a major clinical severity measure for OSA.
The current standard for OSA diagnosis is Polysomnography (PSG)[4]. PSG requires continuous monitoring of multiple physiological signals over the course of a night. Physical contact of sensors with the patient is essential for these measurements. The several hours of PSG data are manually reviewed by an expert sleep technician. Reviewing PSG data is a labor intensive, time consuming and expensive process. PSG is also inconvenient to patients, especially the pediatric population, and results are subjective and unsuitable for population screening.
In the past several researchers have attempted to use patient sounds for diagnosis of maladies related to dysfunctions of the respiratory system. For example, the patient sounds may include snoring sounds used to diagnose OSA. Other maladies, such as pneumonia, asthma, bronchitis, croup and chronic obstructive pulmonary disease (COPD), Tracheobronchomalacia (TBM) or cystic fibrosis also cause characteristic patient sounds. Many of the existing methods depend on the identification of segments of the patient sound that are characteristic of the malady in question. For example, in the case of the malady being OSA then snore segments from the overnight sound data are identified. Hence if the snore segmentation algorithm fails to identify any snore segments or if the patient did not snore then results of the test will be indeterminate. Furthermore, procedures for identifying sounds that are characteristic of a malady of interest, such as snore sounds for OSA diagnosis, or a cough sound for pneumonia diagnosis, in a lengthy patient sound recording are computationally expensive and may be inaccurate. Therefore there is a need for an improved method of diagnosing a malady which does not rely on identification of sounds that are characteristic of a malady of interest in segments of the patient sounds.
According to a first aspect of the present invention there is provided a method for diagnosing a malady of a patient from sounds of the patient including the steps of:
For example, the malady may comprise OSA or a disease state such as pneumonia or another malady that causes a change from normal patient sounds, such as, pneumonia, asthma, bronchitis, croup and chronic obstructive pulmonary disease (COPD), Tracheobronchomalacia (TBM) or cystic fibrosis.
The features may be one or more of pitch, entropy, formants, a Gaussianity or other probability distribution measure and higher-order spectra-based features.
An embodiment of the invention may involve computing a Chi-squared test statistic between a MFCC distribution and a target probability distribution and using the computed test statistic directly as a feature to input to the decision machine.
Another embodiment of the invention may involve computing p-values for a Chi-squared test statistic between a MFCC distribution and the target distribution and use the p-value directly as a feature to feed the decision machine.
The target distribution may be a Gaussian distribution.
Alternatively, other embodiments may involve computing a KS test (Kolmogorov-Smirnov) test statistic in the place of the Chi-squared test statistic.
Another embodiment of the invention may make use of a Lilliefors test for normalcy with the Gaussian distribution.
According to a further aspect of the present invention there is provided a method for diagnosing OSA of a patient including the steps of:
According to another aspect of the present invention there is provided a method of operating one or more electronic processors to diagnose the presence of Obstructive Sleep Apnea (OSA) of a patient comprising:
According to a preferred embodiment of the present invention the forming of the test vector based upon the deviations scores of the MFCCs includes applying a comparator to each of the deviation scores. For example, the comparator may comprise a set of instructions executed by the one or more processors to implement a decision routine.
In an embodiment the output of the routine is a “1” signal if the deviation score is above a threshold or a “0” signal if the deviation score is equal to or below the threshold.
Preferably the method further includes forming components of the test vector for each of the MFCCs by producing sums of outputs from the comparator. In an embodiment the the method includes producing the sums of the outputs from the comparator for each MFCC over all of the epochs.
The method may include averaging each of the sums of the outputs over all of the epochs.
In a preferred embodiment of the invention the method includes reducing dimensionality of the test vector. For example, the method may include removing all but a subset of components of the test vector previously adjudged to be statistically significant for production of the OSA signal from the pre-trained decision machine.
Preferably the method includes forming the test vector on the basis of the entire digital audio signal.
In one embodiment of the invention the probability distribution is a Gaussian distribution and the deviation from a probability distribution score is a non-Gaussianity Score (NGS) or non-Gaussianity “Index” though other distributions may also be used and measures of deviation from those distributions may also be used.
For example, other embodiments may involve computing a KS test (Kolmogorov-Smirnov) test statistic in the place of the Chi-squared test statistic.
Another embodiment of the invention may make use of a Lilliefors test for normalcy with the Gaussian distribution.
According to a further aspect of the present invention there is provided an apparatus for diagnosing the presence of Obstructive Sleep Apnea (OSA) of a patient comprising:
According to another aspect of the present invention there is provided a computer readable medium bearing tangible, non-transitory machine readable instructions for execution by one or more electronic microprocessors including instructions for:
In one embodiment of the invention the distribution is a Gaussian distribution and the deviation from probability distribution score assembly is a non-Gaussianity score (NGS) assembly and the deviation score is a non-Gaussianity Score or “index”. It will be realized that other distributions are also useable and encompassed by embodiments of the present invention and some of these other distributions are described toward the end of this specification.
According to a further aspect of the present invention there is provided a method for diagnosing OSA of a patient including the steps of:
Preferred features, embodiments and variations of the invention may be discerned from the following Detailed Description which provides sufficient information for those skilled in the art to perform the invention. The Detailed Description is not to be regarded as limiting the scope of the preceding Summary of the Invention in any way. The Detailed Description will make reference to a number of drawings as follows:
Referring initially to
The microprocessor 3 is in data communication with a plurality of peripheral assemblies 9 to 23, as indicated in
Although the OSA diagnostic device 1 that is illustrated in
In a preferred embodiment the OSA diagnostic device 1 is programmed with App 6 so that it operates as a decision device that requires no external sensors, physical contact with patient 2 or communication network 31.
In use the nominal distance from the microphone 25 of device 1 to the face of patient 2 is set to about 50 cm, but may vary between 40 cm to 70 cm due to patient movements.
Referring now to
At box 41 of
The breathing sound 39 of patient 2 is recorded by the diagnostic device 1 and
As the recording proceeds an audio file is stored in an electronic storage assembly such as either memory 5 or secondary memory 14, which is typically a Secure Digital (SD) memory card. The audio file may be stored in a compressed format such as MP3 or in a non-compressed format such as a WAV or FLAC file. The pros and cons of using a compressed format as opposed to an uncompressed format will be discussed later in this specification. Depending on the hardware configuration the selection of the sample rate may alter a sample rate parameter in Audio Interface 21 or alternatively the analog-to-digital conversion may be made at 44.1 kHz in the audio interface 21 and then down-sampled by the microprocessor 3 in accordance with instructions in OSA Application 6.
The procedure that microprocessor 3 uses to make a diagnosis of a malady, which in the present example is OSA, and which comprises instructions that make up App 6 is illustrated in the flowchart of
Producing the Lrm
In order to create the trained Logistic Regression Machine (LRM) 20 the Inventors initially recorded sounds from Q=41 patients including individuals with symptoms such as daytime sleepiness, snoring, tiredness lethargy etc. and who were suspected of OSA. It will be realised that a similar procedure is followed in order to train the LRM for detection of other maladies and that in that case sounds would be recorded from patients suffering from the malady in question.
The steps that have previously been described in relation to boxes 43 to 61 of
Pattern Classifier
As previously discussed, App 6 includes instructions for implementation of a logistic-regression model (LRM) as the “pattern classifier” or “decision machine” for classifying test patient sounds as suffering from a malady being OSA in the exemplary embodiment. It will be realized that in other embodiments of the invention other types of decision machine may also be used such as trained neural nets, Bayesian decision machines and support vector machines and that other maladies, such as those that have previously been referred to may be the subject of the training of the pattern classifier or decision machine.
The LRM that is implemented by App 6 in the present embodiment of the invention is the best LRM that could be determined by the methodology that the Inventors have devised and which will now be described.
An LRM is a generalized linear model, which uses several independent features to estimate the probability of a categorical event (dependent variable). In the present case, the dependent variable Y is assumed to be equal to ‘one’ (Y=1) for ‘OSA’ subjects and ‘zero’ for ‘non-OSA subjects. OSA and non-OSA subjects were defined using 3 different AHI thresholds, AHI=[5; 15; 30;]. These AHI thresholds are routinely used in the clinical practice to define the severity of OSA as follows:
As is known in the prior art, an LRM model is derived using a regression function to estimate the probability Y given the independent features in Ψc as follows:
In (6), β0, is called the intercept and β1, β2 and so on are called the regression coefficients of independent variables. To select the optimal decision threshold A from Y (that subject is OSA if Y>λ; non-OSA otherwise) the Receiver-Operating Curve (ROC) analysis was used.
The Inventors used a K-fold cross validation (KCV) technique for the LRM design, setting K=10. In KCV technique, subject population in the database is randomly partitioned into K-equal size non-overlapping subsamples. Then of the K subsamples, data from subjects in K−1 subsamples are used to train the LRM model and data from subjects in the remaining one subsample is used to test the model. This process is systematically repeated K times such that each patient in the database is used to test the model exactly one time. At the end of this process, we end up with κ different LRM models. To evaluate the performance of the designed κ LRMS, performance measures such as Sensitivity (Sn), Specificity (Sp), Accuracy (Ac), Positive Predicted Value (PPV) and Negative Predicted Value (NPV) were computed.
Feature Selection
Feature selection is a technique of selecting a subset of features for building a robust classifier. Optimal feature selection requires the exhaustive search of all possible subsets of features. However, it is impractical to do so when large numbers of features are used as candidate features. Therefore, an alternative approach was used based on p-value to determine significant features. During LRM design, a p-value can be computed for each feature to indicate how significant that feature is to the model. Important features have low p-value. The Inventors used this property of an LRM to select a reasonable combination of features that facilitate the classification, in the model during the training phase. The technique that was used consisted of computing the mean p-value associated with Ψc for κ LRM models. Then selecting the features with mean p-value less than a threshold pths. Let Ψsc be the feature vector with subset of the selected MFCC component index and Mfs (of size Q×Ψsc) be the feature matrix computed from selected features.
Once the significant features were known and selected they were used to build a new set of LRMs, following K-fold cross validation (K=10) as previously described. At the end of this process, κfs number of LRMs were produced using the selected features.
As previously mentioned, the Inventors used breathing sound data from Q=41 subjects. According to AHI severity these subjects were divided into four groups namely:
(i) Group 1, non-OSA subjects with RDI<5
(ii) Group 2, 5≤AHI<15, mild OSA,
(iii) Group 3, 15≤AHI<30, moderate OSA and
(iv) Group 4, AHI≥30, Severe OSA.
Table 1 sets out the demographic details of the subjects in the database for four subject groups.
Comparison Between Different File Formats
One of the Inventors' objectives was to evaluate the effect of data compression on the classifier performance. For this the nocturnal breathing sound audio data was recorded from subjects in raw audio data format, WAV format. Then using Adobe Audition™ the data was converted into FLAC (loss-less audio format) and Mp3 (lossy audio data format).
The average length of the audio data recordings from Q=41 subjects were 7 hours and 4 minutes with standard deviation of 1 hour and 38 minutes. The average size of an audio data recording with Fs=44100 Hz, were, WAV file=2.25±0.24 Giga bytes, FLAC file=0.95±0.11 Giga bytes and that of MP3 file=0.61±0.06 Giga bytes. On average size of a FLAC audio data file with Fs=44100 Hz was 58±5% smaller than that of WAV file and Mp3 audio data file was 73±0.04% smaller than that of WAV file.
The Inventors investigated a snore sound waveform and its spectrogram using different audio file formats and at different sampling rates. They found no difference between the WAV file format and the FLAC file format and no difference in the time domain or in the frequency domain at all the sampling rates. With respect to the Mp3 audio file, no obvious changes could be seen in the time domain signal however a clear attenuation of the higher frequencies could be seen in the spectrogram. However high frequency attenuation could only be seen at Fs=44100 Hz and was not present at Fs=8000 Hz or 2000 Hz.
Classification Results—Comparison Between File Format at Fs=44100
As previously discussed the LRM were trained using Ψc feature vectors which were derived from MFCC and NGS following a K-fold cross validation technique to classify patients into OSA and non-OSA. The LRM were trained to classify patients into OSA and non-OSA at different AHI thresholds of [5; 15; 30;]. The LRM were initially trained using all features and then the LRM models were retrained using a selected sub-set of features.
Table 2 gives the test classification results for OSA diagnosis at different AHI thresholds optimized for epoch lengths. These results are for audio data sampled at Fs=44,100 Hz.
It will be observed from Table 2 that there is no difference in classification accuracy between WAV and FLAC audio data at all the AHI thresholds. When selected features are used for model training, WAV and FLAC audio format have classification sensitivities/specificities of 94/86%, 83/91% and 100/93% respectively at AHI=5, 15 and 30. Classification results using Mp3 audio data format were slightly lower than WAV/FLAC audio data format. The sensitivities/specificities of the Mp3 data was 88/86%, 83/87 and 92/89% respectively at AHI=5, 15 and 30.
Classification Result—Effect of Sampling Frequency
As previously discussed, the patient sounds may be resampled with different sampling frequencies Fs=[22050; 11025; 8000; 6000; 4000; 2000; 1000; 500; 200;] Hz. Note that audio data is initially recorded at Fs=44100 Hz. MFCC features were then computed with resampled data.
The results indicate that methods according to embodiments of the present invention can classify patients into OSA and non-OSA at different AHI threshold with a high accuracy.
In the past several researchers [10-16] have attempted to use snoring sounds to diagnose OSA and many of the existing methods [10-12, 14] have depended on the identification of snore segments from the overnight sound data. Hence if the snore segmentation algorithm fails to identify any snore segments or if the patient did not snore then results of the test will be indeterminate.
In contrast to those previous methods that have relied on detection of snore segments in the patient sound for subsequent diagnosis of OSA, preferred embodiments of the invention described herein capture the instantaneous characteristics of the upper airway present in continuous recordings of the breath sound.
Furthermore, preferred embodiments of the invention make use of MFCC features for the diagnosis of OSA via measuring the amount of deviation of MFCC features from Gaussianity in a given sound segment (“epoch”). This approach has the advantage of better performance, robustness against AHI variation and low computational complexity as it does not depend on identifying snore segments from breath sound data.
The Inventors' results also illustrate that it is possible to record the patient sounds, i.e. the sounds of the patient breathing, with a compressed audio format and at a low sampling rate without compromising on classification accuracies. The results show that it is possible to achieve a sensitivity/specificity of 97/86%, 94/83% and 92/89% respectively at AHI threshold of 5, 15 and 30, with breath sound data recorded using Mp3 file format at Fs=6000 Hz (
Previously in
A dedicated OSA diagnostic apparatus 100 for diagnosing the presence of Obstructive Sleep Apnea (OSA) of a patient 2 is illustrated in
The apparatus 100 includes an epoch identification assembly 128 that is coupled to an output side of the pre-emphasis assembly 126 to process the digitized audio file and identify a number of epochs in the audio file. A sub-segment identification assembly 130 is provided that is arranged to process the digitized audio file and identify a plurality of sub-segments therein for each of the epochs.
The sub-segment ID assembly 130 and the Epoch ID Assembly 128 provide respective outputs to the Mel-Frequency Cepstral Coefficient generator 132 which processes the digitized audio file from the pre-emphasis assembly 126 to produce a multiplicity of mel-frequency cepstral coefficients (MFCCs) signals for each of the sub-segments.
A non-Gaussianity Score calculation assembly 134 is provided that is responsive to the Mel-Frequency Cepstral Coefficient generator and which is arranged to process the MFCC signals from the MFCC generator 132 for each of the sub-segments to produce NGS scores for each of the MFCCs signals for each epoch as identified by the Epoch ID Assembly 128. In other embodiments of the invention a deviation from probability distribution score calculation assembly may be used to calculate a score for deviation from another distribution other than Gaussian.
The output from the NGS calculator 134 is passed to a comparator 136 which compares each of the MFCCs to a threshold value and respectively outputs a “0” or a “1” if the MFCC value is below or above threshold.
The output from the comparator is summed and averaged by Sum-and-Average block 138 to produce an initial test-vector which is subsequently reduced in dimension by Component Reduction assembly 140 to produce a reduced MFCC feature test vector. The reduced MFCC feature test vector is then passed to a decision machine block 142 which generates an OSA/non-OSA signal in response to the reduced MFCC feature test vector.
The apparatus 100 includes a human-machine interface including diagnostic display 146 that is coupled to the decision machine block 142 and which is arranged to present the OSA diagnosis to a human.
Whilst the previous discussion focused on a method and apparatus according to a preferred embodiment of the invention that uses deviation from Gaussian distribution, other measures of deviation from a known statistical distribution may also be used in other embodiments of the present invention and some of these are listed below. In other embodiments App 6 may include instructions for microprocessor 5 to implement each of the following statistical techniques as an alternative to determining deviation from Gaussian distribution.
Results on the above methods 1-5 are set forth below.
Non-Segmentation Based OSA Classification Results (Scored Using 2007 Alternate Criteria):
Data statistics for 73 usable recordings:
Results Summary (RDI threshold=15):
Set 1 [Train and LOV=53 and Independent Test=20]:
Set 2 [Created after shuffling the training and test data; Train and LOV=53 and Independent Test=20]:
Set 3 [Created after shuffling the training and test data; Train and LOV=53 and Independent Test=20]:
Iphone Data Analysis:
Total Iphone dataset available=83
Scored with 2007 Alternate=81 [data recorded between 2010 and 2014]
Scored with 2012 Recommended=2 [data recorded from 2015 onward]
Non-Segmentation based analysis (Scored using 2007 Alternate criteria):
Total dataset available=81
Data statistics for 70 usable recordings:
Dividing Data into Training and Testing & objectively removing Noisy recordings
Buffer Size=8;
Set 1 [Train and LOV=50 and Independent Test=20]:
Set 2 [Created after shuffling the training and test data; Train and LOV=50 and Independent Test=20]:
Set 3 [Created after shuffling the training and test data; Train and LOV=50 and Independent Test=20]:
Set 1
Set 2
Set 3
Iphone Trained Model Tested on Android Dataset:
Set 1
Set 2
Set 3
In general terms, a method according to an embodiment of an aspect of the present invention comprises a method for diagnosing a malady of a patient from sounds of the patient. The malady may be OSA or a respiratory disease such as pneumonia or some other impairment from normal health that results in changes to the sounds that a patient produces. The method includes the steps of initially making a digital recording of the sounds of the patient and that may be done with a contactless microphone as previously discussed. The digital recording is processed by one or more suitably programmed electronic processors to extract a multiplicity of features for sub-segments of each of a number epochs of the digital recording. Features comprising MFCCs have been discussed in detail but other features can also be used in other embodiments such as pitch, entropy, formants, NGS and higher-order spectra-based features. The features are suitably stored in an electronic data storage apparatus such as an electronic or magnetic storage device or server or network accessible storage. The method then involves operating the processors for determining deviation scores from a probability distribution for each epoch based on the extracted multiplicity of features which are retrieved from the storage. In the preferred embodiment the probability distribution that is used is the Gaussian distribution but other distributions can also be used and have been previously mentioned in the results tabled above. The one or more processors then generate a test vector derived from the deviation scores which is then applied to a pre-trained decision machine which is implemented by the processors or on another data network accessible hardware platform. The decision machine that has primarily been discussed is a LRM but other decisions machines such as artificial neural networks, Bayesian decision machines, support vector machines, might also be used.
Finally a diagnosis of malady on the basis of the output from the decision machine is presented on a display under control of the processors, for example to a clinician in order that suitable therapy can be applied to the patient if a malady has been found to be present. For example, therapy may involve administration of antibiotics (for patients suffering from pneumonia), application of controlled air pressure (for patients suffering from OSA) and other appropriate therapies based upon the diagnosis.
The following references are each incorporated herein in their entireties by cross-reference.
In compliance with the statute, the invention has been described in language more or less specific to structural or methodical features. The term “comprises” and its variations, such as “comprising” and “comprised of” is used throughout in an inclusive sense and not to the exclusion of any additional features. It is to be understood that the invention is not limited to specific features shown or described since the means herein described herein comprises preferred forms of putting the invention into effect. The invention is, therefore, claimed in any of its forms or modifications within the proper scope of the appended claims appropriately interpreted by those skilled in the art.
Throughout the specification and claims (if present), unless the context requires otherwise, the term “substantially” or “about” will be understood to not be limited to the value for the range qualified by the terms.
Any embodiment of the invention is meant to be illustrative only and is not meant to be limiting to the invention. Therefore, it should be appreciated that various other changes and modifications can be made to any embodiment described without departing from the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2018903933 | Oct 2018 | AU | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/AU2019/051135 | 10/17/2019 | WO | 00 |