The present invention relates to a method for diagnosing sleep disorders. More specifically, a preferred embodiment of the present invention is suitable for diagnosing obstructive sleep apnea in a subject with reference to sounds made by the subject.
Obstructive Sleep Apnea Hypopnea Syndrome (OSAHS) is a highly prevalent disease where snoring is one of the most common symptoms. OSAHS occurs during sleep dusrse to collapsing of upper airways leading to cessation of breathing. The narrowing of airways and the relaxed muscles surrounding the airways generally results in snoring. OSAHS is a classic example for a treatable disease, which, by it self is not life-threatening. However, if left untreated, it causes an avalanche of devastating complications such as reduction in cognitive function, cardiovascular diseases, decreased quality of life, fatigue and increased vulnerability to accidents [1].
Full closure of the airways is known as obstructive sleep apnea and a partial closure is defined as obstructive hypopnea. OSAHS severity is expressed by the Apnea-Hypopnea-Index (AHI), which computes the number of Apnea and Hypopnea events per hour, averaged over the total sleep time. The following conventions are currently in clinical use: AHI<5 (no OSAHS), 5<AHI<15 (mild OSAHS), 15<AHI<30 (moderate OSAHS) and AHI>30 (severe OSAHS). Some researchers also use AHI=10 as the threshold for diagnosing the disease.
The current accepted diagnosis for OSAHS is by supervised overnight Polysomnography (PSG) to record a multitude of physiological signals for manual analysis at a sleep laboratory. Thus it is very expensive and time consuming. Limited PSG facilities around the world have resulted in long waiting lists and over 80%-90% individuals with OSAHS currently remain undiagnosed in Australia [2].
There has been a flurry of recent activities at developing technology to address this need [3 and references therein] for simplified diagnostic methods. The main focus of these activities was the development of ambulatory sleep instrumentation hardware.
The National Sleep Disorders Research Plan [4], USA, 2003, reports that the single-signal and full PSG systems meant for home monitoring yielded mixed results, and that none was sufficiently effective for routine use. One of the main problems faced by such devices is that the service of an experienced medical technologist is still required at the site of the test, if acceptable sensitivity/specificity performance is required.
It is an object of the invention to provide a method and apparatus for diagnosing a sleeping disorder in a subject that is an improvement, or at least a useful alternative, to the previously surveyed approaches of the prior art.
According to a first aspect of the present invention there is provided a method for diagnosing a sleeping disorder of a subject including:
In a preferred embodiment of the present invention the parameter quantifying deviation from Gaussianity distribution comprises a non-Gaussianity index.
The method may include the steps of:
It is preferred that the step of calculating the non-Gaussianity scores includes calculating a normal probability of each said segment of the digitized audio signal.
In a preferred embodiment of the invention the method includes comparing the non-Gaussianity scores to a predetermined threshold value.
Preferably the step of computing the non-Gaussianity index based on said scores includes taking into account the comparison of the scores to the predetermined threshold value.
The one or more parameters may further include one or more of: pitch parameters, higher order spectrum parameters, cepstral coefficients and slice parameters.
Preferably the cepstral coefficients are mel-frequency cepstral coefficients.
The slice parameters may comprise diagonal slice parameters.
The higher order spectrum parameters may comprise bispectrum parameters.
It is preferred that the step of processing the digitized audio signal from the subject includes dividing the digitized signal into a plurality of segments.
Preferably the method includes identifying said segments indicating voiced snoring prior to the step of estimating values for the one or more parameters.
According to a further aspect of the present invention there is provided an apparatus for diagnosing a sleeping disorder of a subject including:
According to another aspect of the present invention there is provided an apparatus for diagnosing a sleeping disorder of a subject including:
In a preferred embodiment of the invention the assembly includes a data logger to record the digitized audio signal and to record the signals indicating values for said sleep-related parameters.
The assembly may include a segmentation module arranged to segment the digitized audio signal, said segmentation module being disposed in a signal path between the analog to digital converter and the one or more sleep-related parameter calculation modules.
It is preferred that the sleep-related parameter calculation modules further include one or more of: a total airways response calculator; a pitch calculator; and a cepstral coefficient calculator.
The module arranged to calculate a parameter indicating deviation from Gaussiantity distribution of the digitized audio signal preferably comprises a non-Gaussianity Index (NGI) calculator.
A pre-emphasis module may be coupled to an input side of the NGI calculator for filtering the digitized audio signal prior to processing by the NGI calculator.
According to a further aspect of the present invention there is provided a computer software product comprising a program storage device, readable by machine, tangibly embodying a program of non-transitory instructions executable by the machine to cause the machine to perform a method according to claim 1.
According to another aspect of the invention there is provided a method for producing a diagnostic model to identify a sleep disorder including:
The diagnostic model is preferably a logistic regression model and the step of calculating classification model parameters from said values includes applying logistic regression analysis to the values.
Preferred features, embodiments and variations of the invention may be discerned from the following detailed description of sections V. onwards, which provides sufficient information for those skilled in the art to perform the invention. The detailed description is not to be regarded as limiting the scope of the preceding Summary of the Invention in any way. The detailed description will make reference to a number of Figures throughout as follows.
a. A graph displaying NGI versus AHI for the 86 test and train subjects, for optimum L and β values of 25 s and 0.54, respectively. The NGI threshold for OSA, non-OSAHS discrimination was found to be (ψthresh=7.61). The vertical dashed line marks the (AHI=5) boundary and the horizontal dashed line marks the (ψthresh=7.61) diagnostic edge. The false negative (FN) and false positive (FP) error regions are have been marked on each displayed plot.
The basic premise of the inventors' work is that snore signals carry vital information which enables the state of the upper airways to be characterized. The inventors have realized that the upper airway acts as an acoustic filter during the production of snoring sounds, just as the vocal tract does in the case of speech sounds. Episodes of sleep disorders and sleep dysfunctions such as OSAHS are, by definition, associated with partial or full collapse of upper airways. As such, the inventors have realized that changes to the upper airways brought about by this collapse should be embedded in snore signals. More particularly, the Inventors have found that a parameter quantifying deviation from Gaussianity distribution of the patient's sounds, in the form of a non-Gaussianity distribution index, may be used in embodiments of the invention to assist in diagnosis of sleep dysfunction such as OSAHS.
Snore related sounds (SRS) were initially acquired for later use in training and testing a method according to an embodiment of the invention. Initially snore related sounds (SRS) were acquired simultaneously, but independently, during a routine PSG session at the Respiratory and Sleep Disorders Unit of the Princess Alexandra Hospital, Brisbane, Australia. A pair of low noise microphones having a hypercardiod beam pattern (Model NT3, RODE®, Sydney, Australia) were placed at a nominal distance of 50 cm from the mouth of the patient. A professional quality pre-amplifier and A/D converter unit (Model Mobile-Pre USB, M-Audio®, California, USA) was used for sound signal acquisition, at a sampling rate of 44.1 k samples/s and a bit resolution of 16 bits/sample. The system had a 3 dB bandwidth of 20800 Hz at the sampling rate used. Although it is generally accepted that the bandwidth of snore sounds is about 10 kHz, the sampling rate was kept at 44.1 k samples/s in order to obtain the best sound quality, and to capture, without distortion, any speech segments embedded in SRS sounds.
All night SRS data from 41 subjects (13 normal, 28 OSAHS) was initially studied. Data from each subject was classified into 100 ms segments of voiced non-silence, unvoiced non-silence and silence using an automatic SRS data segmentation method [12]. Following definitions D1-D3 in section VII A., these 100 ms segments were then grouped into episodes. The resulting 39,434 episodes were manually screened for bed noise, Catathrenia (nocturnal groaning) and speech.
From each subject up to a maximum of 100 snore episodes were randomly selected. Pitch and TAR of each 100 ms voiced snore segment within these snore episodes were then estimated using the HOS based method. In some subjects less than 100 episodes of snoring were observed (see Table 1, column 7, where NOE stands for the Number of Episodes analysed). Table 1 shows the summary of subject characteristics and snore episodes used in model derivation.
Previously [11] formalised objective definition of snoring sounds are referred to in this document as defined via (D1)-(D3) below:
(D1) ‘Breathing Episode’ (BE) is defined as the SRS data from the start of an inspiration to the corresponding end of expiration. This period may also contain small segments of silence, for example, between the end of inspiration and the start of expiration (e.g. time interval 1.7-1.9 in
(D2) ‘Snoring Episode’ (SE) is defined as a Breathing Episode with at least one portion of it containing sound with a detectable pitch. The part with detectable pitch is termed a ‘voiced snoring segment’. The rest of the Snoring Episode containing sound without pitch is classified as ‘unvoiced snoring segment’. Just like in a Breathing Episode, there can be ‘silent segments’ inside the Snoring Episode. A snoring Episode must consist of at least one voiced-snoring segment (by definition).
(D3) A Breathing Episode that is not a Snoring Episode is called a Pure-Breathing Episode (PBE).
Snoring segments s(n) were modelled as;
s(n)=h(n)∘x(n), (1)
the convolution between: (i) ‘source signal x(n)’ representing the source acoustical energy and, (ii) ‘(TAR) Total Airway Response h(n)’ which captures the acoustical features of the upper airways. In the case of voiced-snore, the source excitation x(n) is best modeled as a pseudo-periodic sequence having its origins in the vibrations of the upper-airway structures. Higher Order Statistics (HOS) based methods have been previously described in [11] to jointly estimate the pitch of the source (pseudo-periodic sequence) and the TAR of voiced snore segments. Appendix A briefly describes the theoretical derivation of the higher-order spectrum (HOS) based algorithm for joint estimation of pitch and TAR. Pitch and the TAR are used to generate snore parameters (independent variables) for models that are used in various embodiments of the present invention.
Each snore episode (SE), by definition D2, has at least one segment within it with a definable pitch (periodicity). Within a SE, pitch, the fundamental frequency of vibration changes with time.
In the following, the Pitch and TAR are jointly estimated however, in alternative embodiments the pitch (periodicity) estimation may use any suitable periodicity technique, such as autocorrelation, cepstrum analysis or wavelets etc, and work directly on the digitized audio signal data rather than on the TAR.
i) Descriptive Statistics from the Pitch Time Series:
Pitch values computed from episode “j” can be considered as a pitch-time series {pj(k)}. Descriptive statistics of the pj(k) are used as a parameter in the snore-based model as follows:
These parameters shall be referred to as Pitch parameters in the remainder of this document.
ii) Parameterization of the Total Airways Response (TAR):
In order to quantify the shape of the TAR three different methods were employed.
In the first two methods the bispectrum Ck(f1, f2) of the TAR (See Appendix B for details) of kth voiced snore segment within the snore episode j, were estimated. The sum of magnitude Bk(fn) (
and Dk(fn) as
Dk(fn)=Ck(fn,fn) (3).
Since Bk(fn) and Dk(fn) are one dimensional distributions of amplitude as function of frequency representing the spectral content of TAR the following parameters were estimated from them
where Xk(fn) represents Bk(fn) and Dk(fn) and n=1:N number of frequency bins in Bk(fn) and Dk(fn). Three values 500,800 and 1000 were used for frequency bands “b” to estimate R500k, R800k, R1000k, RO500k, RO800k and RO1000k.
For the jth snore episode, similar to pitch-time series {pj(k)}, 18 time series represent the TAR. From each of these time series the mean and variance were estimated. In the rest of this document these parameters are referred to as Bispectral (BS) parameters and Diagonal slice (DS) parameters respectively. While the use of bispectral parameters is preferred and discussed in detail herein, any higher-order spectrum, i.e. order greater than 2, might be employed. For example, trispectrum parameters might be used instead of the bispectrum parameters. Similarly, while it is preferred that diagonal slice parameters be used, any slice except for one w parallel to w1=0, w2=0 or w1+w2=0 in the (w1, w2) plane can be used. The diagonal slice makes use of the (w1, w1) slice.
It is known that the human auditory system is able to hear linearly spaced tones below 1 kHz and logarithmically spaced tones above 1 kHz. Motivated by this observation, Mel-Frequency Cepstral Coefficients have been used in speech recognition applications. The inventors, wishing to provide a method for recognizing the OSAHS patients from snore sounds, estimated the Mel-Frequency Cepstral Coefficients of the TAR (MFCC Parameters) as the next set of parameters. Thirteen Cepstral coefficients (MFCC1j . . . MFCC13j) were estimated using a triangular Mel-filter bank. While it is preferred that mel-frequency cepstral coefficients be used, in general other cepstral coefficients may also be used to implement less preferred embodiments of the invention.
A model using snore parameters, capable of classifying the two groups of snorers, subjects with low AHI (<10) and subjects with OSAHS, was now derived. To this end, there was now a pool of independent variables and one dependent variable with categorical outcomes, namely OSAHS and the non-OSASH snorers.
Logistic Regression analysis (LRA) and Linear Discriminant analysis (LDA) are two of the most widely-used statistical methods for analysing categorical dependent variables. To develop linear classification models, both these methods are appropriate. However, LDA has been developed for normally-distributed independent variables. Also, during the LDA, it is necessary to assume that the underlying data is either linearly related, or have equal within-group variances. On the other hand, LRA makes no assumptions about the distribution of the independent variables. It is reported in the literature that even though LDA and LRA have similar results when the normality assumptions are not violated, LDA is not suitable for all other cases (Pohar et al. 2004). Hence, it is assumed that LRA is a more flexible and more robust method in the case of violations of these assumptions.
LRA is the standard method frequently used for clinical classification problems[13, 14, 15]. Also because it was not feasible to check continually for the normality of the independent variable data and the relative ease of interpreting the results; LRA was settled on to classify the two groups.
In the following discussion, the dependent variable Y is assumed to be equal to “one” (Y=1) in subjects with OSAHS and “zero” (Y=0) in snorers with low AHI (<10).
A model was derived using logistic regression to estimate the probability Y=1 given the independent variables as follows;
where l=(l=1, 2, . . . L) is the observation number and M is the number of independent variables from lth observation in the model. βm (m=0, 1, . . . M) are the model parameters estimated by the maximum likelihood method.
Snore parameters of 10 snores from each subject were averaged to derive independent variables xml for regression model estimation. This resulted in maximum of 10 observations (L<=10) from each patient being used to derive the model.
The statistical software package SYSTAT 12® was used to carryout logistic regression analysis. The package allows the option of entering all available (M) independent variables corresponding to L observations and carryout stepwise analysis. The final model will include only the best independent variables that facilitate the classification.
The final model is then used to estimate the probability Pl for L observations and each observation is classified as belonging to any one of the two groups using a probability threshold Pth. The lth observation is classified as OSAHS positive if Pl≥Pth. Next each subject is classified in to the group where a majority of its observations fall into.
Model Validation
The general and simple approach for robust clinical validation of a classification model is to use the holdout method, where the available data are split into two sets, one to train the classifier and the other to test it. However holdout method of validation was not adopted in this study due to two reasons. The first being the sparse nature of the data sample (Table 1) and the other, the possibility of an unfortunate split where the error would be a maximum. The best validation method to suit the available data set was K-fold cross validation where each subject to be classified was left out from the training data set.
If the set of observations from rth patient Xr={xml; l=1, 2 . . . L} and r=1, 2, . . . 41 (number of subjects in the database). The following steps describe the validation process;
This method also makes it possible to use most of the available data in both model derivation and validation process.
Model Performance Evaluation
In a diagnostic test using any classification model, there are four possible outcomes with respect to the reference standard classification. They are: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN). From these outcomes the measures of classification performance, Sensitivity, Specificity and Accuracy are estimated.
It is possible to obtain maximum sensitivity and specificity in classification of subjects by changing the probability threshold Pth until an optimal Pth is reached. In order to find this optimal Pth, the Receiver Operating Characteristic (ROC) curves were plotted by estimating sensitivity and specificity values for possible Pth values (0<Pth<1 in steps of 0.01). As a measure of classification power of the model, the Area Under the Curve (AUC), was also estimated.
Three possible Pth values were considered, corresponding to different model performance points in the ROC curve as follows:
Pth1—The Pth at optimal model performance point in ROC curve (point with shortest distance to 100% sensitivity and 100% specificity performance)
Pth2—The Pth at 100% sensitivity (with maximum specificity) model performance point in ROC curve
Pth3—The Pth at 100% specificity (with maximum sensitivity) model performance point in ROC curve.
We also estimated the Positive Predictive Value (PPV) and the Negative Predictive Value (NPV). PPV and NPV would provide insights to how the model would perform in a clinical application.
Using the methods described in section VI four different models P1, P2, P3 & P4 were derived from Pitch parameters, BS parameters, MFCC parameters and DS parameters respectively.
The three models, the parameter groups used to derive them and their performance are shown in the Table 2 and their ROC curves are shown in
It is evident from
A. Model Validation
K-fold cross validation process was carried out to evaluate model C. The classification performance was evaluated for each subject using a model, where the parameters βm have been derived by excluding all SE data of that subject. As stated in section VI, the performance of the model was evaluated at three different Pth values. The Table 3 shows the performance of the model and the validation process at each of these Pth values. The corresponding ROC curve for leave-one-out validation is shown in
Referring now to
Central processing unit (CPU) 70 interfaces with storage devices that are readable by machine and which tangibly embody programs of instructions that are executable by the CPU. These storage devices include RAM 62, ROM 64 and secondary storage devices i.e. a magnetic hard disk 66 and optical disk reader 48, via mainboard 68. The personal computer system also includes a USB port 50 for communication with an external ADC module 51 which pre-amplifies, filters and digitises signals from microphones 53 and 55. The microphones pick up sleeping sounds from a subject 42 lying on bed 40. The microphones do not touch the subject. In some circumstances one microphone may touch the bed to capture bed noise. Furthermore, one of the microphones may be a directional microphone that is used to measure background noise so that cars, barking dogs etc can be cancelled out of the digitized audio signal of the sleeping subject that is to be processed, via standard signal processing techniques.
Secondary storage device 66 is a magnetic data storage medium that bears tangible instructions, for execution by central processor 70. These instructions will typically have been installed from an installation disk such as optical disk 46 although they might also be provided in a memory integrated circuit or via a computer network from a remote server installation. The instructions constitute a software product 72 that is loaded into the electronic memory of RAM 62. When executed the instructions cause computer system 52 to operate as an OSAHS diagnosis system and in particular to implement an OSAHS diagnosis method that will be described shortly with reference to the flowchart of
It will be realised by those skilled in the art that the programming of software product 72 is straightforward in light of the method of the present invention, a preferred embodiment of which will now be described. In the following method various variables are manipulated. It will be realised that during operation of computer system 52 to implement the method corresponding registers of CPU 70 will be incremented and data written to and retrieved from secondary storage 66 and RAM 62 by virtue of electrical signals travelling along conductive busses etched on mainboard 68. Consequently, physical effects and transformations occur within computer system 52 as it executes software 72 to implement the method that will now be described.
With reference to
At box 104 the mean pitch mePj and the Skewness of the pitch values sqPj is calculated as discussed in paragraph D.(i) above. At box 106 the Bispectrum Parameters (BS) mean of RO800 and the variance of RO1000 are calculated as discussed in D.(ii) above. At box 108 the Diagonal Slice parameters (DS) being the mean of the center frequency Fck, variance of R1000 and mean of R800 are calculated as discussed in D.(ii) above.
At box 110 the 1st, 8th, 10th and 11th coefficients of the Mel-Frequency Cepstral Coefficients (MFCC) are calculated.
At box 112 the parameter values that have been calculated in boxes 104, 106, 108, 110 and 111 are applied to the pre-calculated obstructive sleep apnea (OSAHS) model. The OSAHS model is the logistic regression model discsussed in section E. above. That model uses model parameters derived from training data gathered from subjects suffering from either simple snoring or OSA. Other classification models may also be suitable. For example although less preferred, discriminant analysis may be used. Alternatively a decision machine such as a Support Vector Machine might also be used to classify the calculated parameters as either being associated with a subject suffering from OSAHS or not.
Application of the calculated parameters to the model at box 112 results in a probability value being returned. At box 114, if the probability value is greater than or equal to a predetermined threshold then the subject is deemed to be suffering from OSAHS and a corresponding message is displayed, for example on display 56 (
As an alternative to the embodiment shown in
A user control interface 138 is provided to allow an operator to vary parameters of the various modules as indicated by the dashed lines in
The display controller 140 and display 142 are also arranged to
Other alternatives are also possible. For example, a low cost audio record-only device may be located near the subject 42. Upon sufficient audio having been recorded from the subject the audio data may then be transferred to a remotely located facility for processing according to the previously described method. Another alternative is to provided a miniature wearable microphone-based device, for example with Bluetooth connectivity to a recorder or a transmission device. In this regard a cellular phone may be a convenient way of transmitting the audio data to the remote facility for processing.
Embodiments of the present invention may be used in conjunction with facility-based diagnosis of OSA. For instance, in combination with an EEG channel to measure the neural dimension of OSA, while the snore sounds capture the mechanical/respiratory features of OSA. A much simpler device, which would operate on less than five subject data channels for example, would then be possible as opposed to the current in-facility equipment that often makes use of more than fifteen channels. In addition such an EEG+Snore system might be fully automated thereby removing the need for manual analysis of the signals.
Previously, with reference to box 111 of
The inventors have hypothesized that snore sounds are non-Gaussian signals, i.e. they do not exhibit a Gaussian distribution, and have conceived of a non-Gaussianity Index (NGI) for the diagnosis of OSAHS via measuring the amount of deviation of snore data from Gaussianity. Accordingly
By way of explanation, the method has been developed by initially modeling the kth segment {tilde over (x)}jk[n] of a digitized overnight Snore Related Sound (SRS) signal, {circumflex over (x)}j, recorded from the jth arbitrary subject, as:
{tilde over (x)}jk[n]={tilde over (s)}jk[n]+{tilde over (b)}jk[n], (10)
{tilde over (s)}jk[n] is the true SRS signal of interest, and {tilde over (b)}jk[n] represents noise and background sounds unrelated to the SRS. The index n=1, 2 . . . N is the sample number within the segment k, and, N denotes the total length of the segment in number of samples. Note that the overall recording {tilde over (X)}2 of the patient j can then be expressed as the concatenation of non-overlapping segments as {tilde over (X)}j={{tilde over (x)}j1[n]|{tilde over (x)}j2[n]| . . . |{tilde over (x)}jk[n]| . . . |{tilde over (x)}jK[n]}, where k=1, 2 . . . K indicates the segment number, and K is the total number of segments in the overnight recording.
{tilde over (b)}jk[n] is modeled as the combination of Gaussian and non-Gaussian components. The Gaussian component mainly consists of the electronic noise contributed by the instrumentation. The non-Gaussian component, however, is other background sounds such as speech and movement related sounds. The desired signal, {tilde over (s)}jk[n], constitutes of three components [20], i.e. Pure Breathing Episodes (PBE), Snoring Episodes (SNE) and inter-episode silence, which have been previously defined.
Both PBE and SNE episodes may include segments of intra-episode silence, mostly as a pause of breathing between inspiration and expiration cycles. By the above definition, SNEs require the presence of periodic structures within episodes, whereas PBEs require the absence of periodicity. Such periodic structures (voiced-components of snoring) tend to make the SNE data non-Gaussian. The noise-like non-periodic components in SNEs and PBEs are similar to unvoiced-components of speech. This is illustrated in the graph of
Following the source/vocal-tract model in speech synthesis, the SRS synthesis process is modeled [36] as shown in
The approach taken is to quantify the deviation from Gaussianity of SRS data and use the resulting index as a measure for OSAHS severity.
The SRS signal {tilde over (X)}j recorded at the microphone (box 200 of
The preprocessed {tilde over (X)}j will be denoted by Xj and used in further processing as described below and in the flowchart of
i) Pre-Emphasis Filtering:
A pre-emphasis filter is used to boost the overall energy of the SNEs and improving the signal-to-noise ratio at higher frequencies. The parameter α □ in equation (11) can be adjusted to set the filter to the desired cut-off frequency. Pre-emphasis filtering is carried out with a typically set to the range of 0.96 to 0.98.
xjk[n]={tilde over (x)}jk[n]−α{tilde over (x)}jk[n−1] (11)
ii) Background Noise and Interference Reduction:
The background noise ({tilde over (b)}jk[n]) mainly consists of (electronic) instrumentation noise and unwanted sounds independent of {tilde over (s)}jk[n] (e.g. night-time speech, groaning, duvet noise etc.). As typically followed in the literature, the electronic noise is treated as a Gaussian process [21, 24]. The method described in this section is centered on computing the distance of the observations from a Gaussian distribution, and thus electronic noise theoretically does not affect the calculations. Furthermore, low-noise sound acquisition systems, like the one employed in the recording process, are now widely available, and the SNR can be further improved by filtering (e.g. pre-emphasis and low-pass). For these reasons the contribution of electronic noise to {tilde over (b)}jk[n] can be safely ignored.
Unwanted background sounds such as speech and groaning may be observed in an overnight snore recording made in the hospital. These sounds are highly concentrated in the first and last 30 minutes of the recording, when the patient is awake, and conversation between the patient and the attending medical technologist occurs. The first and last 30 minutes of data are removed (box 202
The pre-processed SRS signal, Xj, is used to derive a new segment per segment measure, called the Non-Gaussianity Score (NGS), which is a quantitative measure of the deviation from Gaussianity of a segment of Xj, xjk[n]. The method is centered on computing the Normal Probability Plot [38].
i) Normal Probability Plot for the Segment xjk[n]:
The normal probability plot is a plot of the midpoint probability positions of a given data segment versus the theoretical quantiles of a normal distribution [24]. If the distribution of the data under consideration is normal, the plot will be linear. Other probability distributions will lead to plots that deviate from linearity, with the particular nature and amount of deviation depending on the actual distribution itself. The normal probability plot is often used as a powerful qualitative tool in visualising the ‘Gaussianity’ of a given set of data. A method is proposed to derive a quantitative measure for the deviation from Gaussianity, based on the normal probability plot of xjk[n].
To obtain the normal probability plot for a segment of the analysed SRS signal, xjk[n], the inverse of the normal Cumulative Distribution Function (γ) for the data is first calculated (at box 206 of
y=F−1(p|μ,σ)={y:F(γ|μ,σ)=p} (12)
where,
The probabilities were then assigned to p in (13) to obtain the required γ values for plotting the normal probability plot of the data in xjk[n]. However, the normal probability plot conducts a comparison between γ and probabilities of a would-be, or reference, Gaussian dataset, if xjk[n] was to represent a data segment with a Gaussian distribution. This is, thus, calculated (at box 208 of
ii) A Quantitative Measure for the Non-Gaussianity of xjk[n] (NGS, ψjk):
The deviation of the probability plot of xjk[n] from a straight line (i.e. the shape of a true Gaussian probability plot, g) was chosen as the segmental measure of non-Gaussianity, symbolised by ψjk. Linear regression [21] was utilised to estimate ψjk at box 210 of
Where, g[n] is the set containing probabilities of the reference normal data for segment xjk[n], and ψjk is the non-Gaussianity score directly mapped to the segment xjk[n]. It must be noted that due to use of linear regression the NGS values range such that, (0≤ψjk≤1).
After estimating ψjk for all k=1, 2 . . . K (at box 210 of
(S1) At box 210, the obtained NGS(ψjk) values are directly mapped to each analysed segment, xjk[n], as an obtained score representing all samples of the segment of data. These scores are referred to as ψjk, to indicate the non-Gaussianity score of the kth segment of the jth subject.
(S2) At box 212 the ψjk values mapped to xjk[n] segments are thresholded using threshold value β. Define a new sequence yjk[n], n=1, 2 . . . N and k=1, 2 . . . K such that, for ψjk values above β, all samples of defined segments, yjk[n], are set to “1” (at box 214) while otherwise they are set to “0” (at box 216).
(S3) At box 218, once all yjk[n] segments are defined, the Xj SRS signal is partitioned into larger blocks consisting of concatenated and non-overlapping yjk[n] segments, with each block having a length of Q segments and thus NQ samples, in order to obtain blocks Bjl[m], where
and m=1, 2 . . . NQ. For simplicity, NQ, is referred to, which represents block length in number of samples, using L.
(S4) At box 220, the summation of the L samples within each Bjl[m] block is then obtained as Σm,∀lBjl[m]. If this value is equal to or above “1”, a new block Cjl[m] is defined as Cjl[m]=1 for
and m=1, 2 . . . L, otherwise, it is defined as Cjl[m]=0.
(S5) At box 226 the summation over all L samples for each Cjl[m] was then obtained as Σm,∀lCjl[m] and divided by the total length of the block, L, to achieve the total number of blocks within the Xj signal defined as Cjl[m]=1. This value was then divided by the total length of the analysed Xj signal, in hours, in order to obtain the overall non-Gaussianity Index (NGI), (equation 16) for the jth subject. This is referred to using symbol ψj, denoting the overall NGI of the jth subject, which is in turn utilised for diagnosis.
At box 228 the NGI for the jth subject is displayed on monitor 56 (
In particular, a non-contact, take-home non-Gaussianity apnea monitor incorporating the method may be provided in conjunction with the apparatus previously described in relation to
This section provides descriptions of the database utilised for the evaluation process, the testing and training methods employed during evaluation, and the obtained evaluation results.
The Database
The AJaPSG (Australia-Japan PSG) [32] database was used for the work of this paper. It consisted of snore sounds recorded from 86 subjects while undergoing routine PSG tests at the Sleep Disorders Unit of The Princess Alexandra Hospital, Brisbane, Australia.
The sound acquisition system consisted of a pair of matched low noise microphones having a hypercardiod beam pattern (Model NT3, RODE, Sydney, Australia). The nominal distance from the microphone to the mouth of the patient was 50 cm, but could vary from 40 to 70 cm due to patient movements. A professional quality pre-amplifier and A/D converter unit (Model Mobile-Pre USB, M-Audio, California, USA) was used for data acquisition, at a sampling rate of 44.1 kHz and a 16 bits/sample resolution. The system had a 3 dB bandwidth of 20,800 Hz at 44.1 kHz.
PSG data were gathered from a Compumedics (Sydney, Australia) sleep acquisition system. PSG measurements consisted of standard channels such as EEG, EOG, ECG, EMG, leg movements, respiration nasal airflow, nasal pressure, respiratory movements, blood oxygen saturation, breathing sounds and the body position. Expert human scorers analysed PSG data and computed the AHI as part of the standard diagnostic protocol. The database consisted of subjects with AHI ranging from 0.5 to 106 (AHI mean=29.85, AHI standard deviation=25.74), and was divided into the two subsets of testing and training. The training set was selected by randomly choosing 20 of the 86 subjects in such a way as to achieve an approximately uniform distribution for the training set (see
Pre-Emphasis Filtering of {tilde over (x)}j
In section IX the use of the pre-emphasis filter was discussed. The pre-emphasis filter is used to correct the roll-off observed in the spectrum of quasi-periodic signals such as snoring. This filter acts as a high pass filter, eliminating low frequency noise and, in turn, relatively boosting the energy of the higher frequency region of snore segments.
The response of the pre-emphasis filter is dependent on the parameter α, as was previously mentioned in section IX. This variable is commonly set between the range of 0.94 to 0.98, and little difference is observed between the values in this range.
A value of (α=0.96) was utilized to achieve a mid-point response between the commonly utilised values of 0.94 to 0.98. The results indicated that low frequency noise was attenuated and the energy of snore segments were boosted accordingly. An example of the effect of the pre-emphasis filter can be observed in
Computation of the Non-Gaussian Score (NGS)
It was discussed (in section IX) that a normal probability measure was utilized to obtain the ψjk scores for each analysed segment of data, xjk[n]. The success of normal probability analysis relies on differentiating a normally distributed process from a non-normally distributed one. The energy-normalised distribution templates for a segment of data taken from a breathing/silence episode, snoring episode from a non-OSA, and snoring episode from an OSAHS patient, along with their corresponding normal probability plots and NGS scores can be seen respectively in
The system was first trained using the training set previously discussed in section III-A. The training set consisted of overnight snoring sounds (6-8 hours duration) recorded from 20 subjects. The 20 subjects were randomly and in such a way as to achieve an approximately uniform distribution with respect to their associated AHI. This was done to avoid biased experimentation. The remaining 66 subjects were then used as the test set.
i) β Threshold Tuning:
Training was utilised to obtain an optimum β value for each L-length partitioning experiment. To do this, 10 different values were utilised for L. The L values were chosen in a such a way so as to correspond to: 1, 3, 5, 7, 9, 10, 15, 20, 25, and 30 seconds in length. This was done to mark Cjl[m] blocks as “success” (Cjl[m]=1) or “failure” (Cjl[m]=0), as was displayed in
Receiver operating curves (ROC) were used to optimise the method. ROC curves were chosen to analyse the behavior of the system with respect to threshold variation. ROC curves were calculated for β values (range of 0 to 1 with steps of 0.01) per L-length partitioning experiment. Areas under each of the ROC curves were then plotted against their associated β values, those values that gave an area of 1 under the curve, or otherwise known as, a detectability of 100%, were marked. The median β value of the set, which gave the maximum area, was chosen to leave a guard-band for the error caused by the trained β value used for testing, thus ensuring error minimisation and increased detectability for the future testing process.
ii) Evaluation Using Tuned β:
The obtained β values from the training process were then utilised to conduct testing. The testing of the system was only carried out on the chosen β value from the training set. Hence, only one ROC curve was obtained per L-length partitioning experiment. It must be noted that testing experiments were conducted independently to the training experiments. That is, the training set was analysed using the ROC curves to obtain the optimum threshold for that set only. This value was then employed to test the testing set and achieve the reported results. Only the β values were chosen using information obtained from the training data. The obtained ROC curves for each of the L-length partitioning experiments can be seen in
i) Effect of L in Optimization:
It is seen from
ii) Analysis of Optimized Results:
The L-length partitioning experiment at L corresponding to 25 s was analysed in detail to obtain further information.
As was seen in section III-E, the training data was utilised to obtain threshold values β and ψthresh for performing testing and evaluation based on specific error optimisations. Hence, experiments were carried out to analyse the performance of NGI when optimised for ppv or npv measures. The results were obtained and can be observed in TABLE V and VI. The tables indicate that the most accurate diagnosis, in either case, is obtained for NGI value corresponding to (AHI=10). In addition, the system performs at a higher specificity when optimised for the ppv measure, and a higher sensitivity when optimised for the npv measure.
The theoretical derivation of HOS based algorithm for joint estimation of pitch and TAR from the SRS signals. The third order cumulants [14] cs (τ, ρ) of the zero mean observation s(n) in equation (1) is defined as:
Cs(τ,ρ)=E{s(n)s(n+τ)s(n+ρ)}, (17)
where E{⋅} denotes the expectation operator.
The bispectrum Cs(ω1,ω2) [15] of s(n) is defined to be the Fourier transform of the third order cumulant Cs(τ,ρ). Converting (10) to the bispectrum domain the following result is obtained:
Cs(ω1,ω2)=Cx(ω1,ω2)H(ω1)H(ω2)H•(ω1+ω2), (18)
where Cx(ω1, ω2) is the bispectrum of x(n), H(ω) is the spectrum of h(n), “•” denotes conjugate
Now the problems at hand are to estimate (i) the TAR (h(n)), and (ii) the properties of the source excitation x(n) as relevant to the voiced and unvoiced snore segments.
In this document the bicepstrum [14], [15] is used, which is defined as the 2-dimensional cepstrum of the bispectrum. In the bicepstrum domain, from (11) the following result is obtained,
bs(m1,m2)=bx(m1,m2)+bh(m1,m2), (19)
where bx(m1, m2), bh(m1, m2) and bs(m1,m2) are the bicepstra of x(n), h(n) and s(n) respectively. Note that the multiplication in the bispectra domain has now become addition in the biceptra domain.
As known in the HOS signal processing community [14], [15], the slice bs(m,0) of the bicepstra bs(m1,m2) gives the 1-dimensional complex cepstrum of s(n), which can be used to reconstruct the signal s(n) or extract x(n) and h(n) through inverse 1-dimensional complex cepstra operations.
As described in section III B when the “source excitation” x(n) for voiced snoring is considered to be a pseudo-periodic pulse-train, the 1-dimensional complex cepstrum bx(m,0) of x(n) too consists of a periodic impulse train. The contribution bh(m,0) of h(n) to the complex cepstrum bs(m,0) of s(n) is non-periodic and can be easily separated from bs(m,0), through an appropriate window operation as defined by:
The symbol W(m; l) represents a rectangular window with its size parameterized by l. If the first occurrence of the repetitive, impulse peak of bs(m,0) is noted at the sample number m=L (corresponding to the pitch), one can set I=L−1 in (13). Applying W(m; l) to bs(m,0) we get,
bsw(m,0)=bhw(m,0)=bh(m,0), (21)
where bsw(m,0) and bhw(m,0) are the windowed versions of bs(m,0) and bh(m,0) respectively.
Based on (14), the following estimate {tilde over (h)}(n) of the TAR (h(n)) is obtained through inverse complex operations as defined by:
{tilde over (h)}(n)=F−1{eF{ĥ(m)}}, (22)
where F and F−1 respectively denote the Fourier transform and its inverse, and,
The third order cumulants [14] ch(τ, ρ) of the TAR (h(n)) is defined as:
Ch(τ,ρ)=E{h(n)h(n+τ)h(n+ρ)}, (24)
where E{⋅} denotes the expectation operator.
The bispectrum Ch(f1, f2) [15] of h(n) is defined to be the Fourier transform of the third order cumulant ch(τ, ρ). Converting (17) to the bispectrum domain results in:
Ch(f1,f2)=H(f1)H(f2)H•(f1+f2), (25)
H(f) is the spectrum of h(n), “•” denotes conjugate.
The references made to the following articles are not to be taken as any suggestion or admission that the articles form part of the common general knowledge. The content of each article is incorporated herein in its entirety by reference for all purposes.
In the present specification and claims, the word “comprising” and its related and derivative terms, including “comprises” and “comprise”, are to be interpreted in an inclusive sense as including each of the stated integers but without excluding the inclusion of one or more further integers.
In compliance with the statute, the invention has been described in language more or less specific to structural or methodical features. It is to be understood that the invention is not limited to specific features shown or described since the means herein described comprises preferred forms of putting the invention into effect. The invention is, therefore, claimed in any of its forms or modifications within the proper scope of the appended claims appropriately interpreted by those skilled in the art
Number | Date | Country | Kind |
---|---|---|---|
2008906383 | Dec 2008 | AU | national |
2009901558 | Apr 2009 | AU | national |
This application is a divisional of U.S. patent application Ser. No. 13/139,052, filed Sep. 14, 2011, pending, which is the U.S. national phase of PCT International Patent Application No. PCT/AU2009/001609 filed Dec. 10, 2009 which designated the U.S. and claims priority to Australian Patent Application No. 2008906383 filed Dec. 10, 2008 and Australian Patent Application No. 2009901558 filed Apr. 9, 2009, the entire contents of each of which are hereby incorporated by reference in this application.
Number | Name | Date | Kind |
---|---|---|---|
4776345 | Cohen et al. | Oct 1988 | A |
4982738 | Griebel | Jan 1991 | A |
5008630 | Reimel | Apr 1991 | A |
20050043645 | Ono et al. | Feb 2005 | A1 |
20070096927 | Albert | May 2007 | A1 |
20070287896 | Derchak et al. | Dec 2007 | A1 |
20100102971 | Virtanen | Apr 2010 | A1 |
Entry |
---|
Abeyratne et al. Proceedings of the International Conference on Information and Automation, Dec. 15-18, 2005, Colombo, Sri Lanka, pp. 25-30, “Sleep Apnoea: a Serious Public Health Concern with Tough Information Processing Challenges.” |
Fernandez et al. (GAPS: Signal, Systems & RadioComm. Dept., Universidad Politecnica de Madrid, Ciudad Universitaria Madrid Spain and ATVS Biometric Recognition Group and Universidad Autonoma de Madrid Spain, pp. 1785-1790, “Design of a multimodal database for research on automatic detection of severe apnoea cases,” May 26-Jun. 1, 2008. |
Alshebeili et al. IEEE Transactions on Circuits and Systems-11: Analog and Digital Signal Processing, vol. 39, No. 9, Sep. 1992, pp. 634-637 “Identification of Nonminimum Phase MA Systems Using Cepstral Operations on Slices of Higher Order Spectra”. |
Abeyratne et al. (Proceedings of the International Conference on Information and Automation, Dec. 15-18, 2005, Colombo, Sri Lanka, pp. 25-30, “Sleep Apnoea: a Serious Public Health Concern with Tough Information Processing Challenges.”). |
Abeyratne, Udantha R., et al, “Pitch-Jitter Analysis of Snoring Sounds for the Diagnosis of Sleep Apnea,” School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Oct. 25, 2001, 5 pages. |
Abeyratne, Udantha R. et al., “Sleep Apnoea: a Serious Public Health Concern with Tough Information Processing Challenges,” Proceedings of the International Conference on Information and Automation, Dec. 15-18, 2005, Colombo, Sri Lanka, pp. 25-30. |
Al-Ani, Tarik, et al, “A Stochastic Learning Based Approach for Automatic Medical Diagnosis Using HMM Toolbox in Scilab Environment,” Proceedings of the 2005 IEEE Conference on Control Applications, Toronto, Canada, Aug. 28-31, 2005, pp. 1099-1103. |
Alshebeili et al., “Identification of Nonminimum Phase MA Systems Using Cepstral Operations on Slices of Higher Order Spectra,” IEEE Transactions on Circuits and Systems-11: Analog and Digital Signal Processing, vol. 39, No. 9, Sep. 1992, pp. 634-637. |
Fernandez, Ruben et al., “Design of a multimodal database for research on automatic detection of severe apnoea cases,” (GAPS: Signal, Systems & RadioComm. Dept., Universidad Politecnica de Madrid, Ciudad Universitaria Madrid Spain and ATVS Biometric Recognition Group and Universidad Autonoma de Madrid Spain, May 26-Jun. 1, 2008, pp. 1785-1790. |
Pozo, Ruben Fernandez et al., “Assessment of Severe Apnoea Through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques,” EURASIP Journal on Advances in Signal Processing, Dec. 2009, vol. 2009, Art. ID 982531, pp. 1-11. |
International Search Report dated Feb. 8, 2010 in PCT International Patent Application No. PCT/AU2009/001609, 5 pp. |
Sola-Soler et al., “Automatic classification of subjects with and without Sleep Apnea through snoring Analysis,” Proceedings of the 29th Annual International Conference of the IEEE EMBS Cite Internationale, Lyon, France, Aug. 23-26, 2007, pp. 6093-6096. |
Written Opinion mailed Feb. 8, 2010 in PCT International Patent Application No. PCT/AU2009/001609, 11 pp. |
Number | Date | Country | |
---|---|---|---|
20150039110 A1 | Feb 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13139052 | US | |
Child | 14518869 | US |