The present invention generally relates, amongst others, to a method for obtaining respiratory related sounds, RRSs, originating from a target patient.
In the field of sleep analysis one of the elements to study are respiratory related sounds, RRSs. An RRS is a short audio fragment of a sound originating from a patient during their sleep analysis, for example a snoring sound, a sighing sound, a heavy breathing sound, or a moaning sound. Further analysis of such sounds may then be used to diagnose sleep disorders, such as sleep apnoea. It may further be desirable to count the duration of each RRS, the frequency of the RRSs, the total number of RRSs, and analyse various aspects of the RRSs.
The RRSs and related metrics may be obtained from an audio recording of the sleeping patient.
One way to obtain such an audio recording is by attaching a recording microphone on the face of a patient, as close to a patient's nose and mouth as possible. This has the advantage that external sounds and noises are mitigated by design. However, the presence of such a microphone may negatively influence the patient's sleep, and as a result the detected RRSs may not accurately reflect the natural sleep of the patient.
Alternatively, an audio recording device, for example a digital audio recording device, such as a mobile phone, or a dedicated audio recording device, may be placed further in the vicinity of the target patient. This way, the patient is not hindered by a microphone or any other device on or close to their face, resulting in a more natural sleep. However, in this case the drawback is that the RRSs of another person may be recorded onto the audio recording if the patient is not sleeping alone in the room.
US2020261687A1 discloses a solution for dynamically masking audible breathing noises determined to be generated by one or more sleeping partners. According to aspects, a subject's sleep is protected by detecting audible breathing noises in a sleeping environment, determining the audible breathing noises are not generated by the subject, and mitigating the perception of the audible breathing noises that are determined to originate from anther subject, such as a bed partner, pet, etc. The dynamic masking reduces the subject's exposure to unnecessary sounds and reduces the chances of masking sounds disturbing the subject's sleep.
It is therefore an aim of the present invention to solve or at least alleviate one or more of the above-mentioned problems. In particular, the disclosure aims at providing a method for identifying RRSs of the target patient in a relatively comfortable way without hindering the patient's natural sleep.
To this aim, according to a first aspect, a computer-implemented method for obtaining respiratory related sounds, RRSs, originating from a target patient is provided, the method comprising the steps of:
The input audio recording covers the sleeping environment of the target patient, i.e. apart from the target patient's RRSs, it may further comprise RRSs from other persons or animals and other environment sounds. The input audio recording thus comprises a plurality of the target patient's RRSs. Those are then all or partly selected during the selecting step. In order to distinguish the RRSs originating from the target patient from other sounds, the RRS sounds are selected based on a respiratory trace, i.e. a representation of the target patient's respiration as a function of time that covers the duration of the input audio recording. As the RRSs originating from the target patient are related to the target patient's respiration, there is a relation between these RRSs and the respiration. As a result, the RRSs originating from the target patient can be distinguished from other sounds in the input audio recording.
This results in a set of sounds that is free from other sounds that could negatively influence the analysis, allowing an accurate sleep analysis to be made. Further, as other sounds are filtered out, the audio recording does not need to be performed very close to the patient's mouth or chest. This means that the microphone does not suppress RRSs from the target patient, or does not cause unwanted RRSs itself.
Regarding the first subset, only those RRSs with a probability of originating from the target patient above a certain threshold may be selected, e.g. having a probability higher than 90%. This assures a low output error. Further, selecting RRSs with a high probability will typically be easy to determine, i.e., require low computing power and/or memory capacity.
Regarding the second subset, only those RRSs with a probability of originating from the target patient below a certain threshold may be selected, e.g. a probability lower than 10%. This second subset may then further be discarded from the result.
By the trained classifier, the results obtained according to the first and/or second subset may be further refined by adding other RRSs that were not assigned to the first and/or second subsets. To accomplish this, a classifier is first trained with one or both the subsets to classify the RRSs as either belonging to the target patient or not. In other words, the first and/or second subset is used as labelled data. Then, the trained classifier is used to further classify the other RRSs resulting in a larger selection of RRSs originating from the target patient.
The respiratory trace may further be obtained by techniques that are available in the art, for example by deriving the trace from a signal obtained by a polysomnograph, an electrocardiograph, an electromyograph, or a photoplethysmogram (PPG).
One step is the identification of RRSs. According to an embodiment, this step further comprises determining respiratory related sounds and non-respiratory related sounds, and discarding the non-respiratory related sounds.
In other words, the sounds that are not related to respiration are discarded from the audio recording first, resulting in a subset of sounds that are RRSs but which do not necessarily originate solely from the target patient. Based on the respiratory trace, the RRSs originating from the target patient are then selected from this subset.
According to an embodiment, the identifying comprises determining sets of sounds; wherein the sounds of a set originate from a same source; and wherein the selecting further comprises, based on the respiratory trace, selecting RRSs from a set of sounds originating from the target patient.
In other words, sounds are first divided into sets or clusters according to their origin. At that point it is not yet known which of the sets originate from the target patient. By reference to the respiratory trace, RRSs of a certain set can then be attributed to the target patient. Optionally, the identifying and discarding of non-RRSs may be performed before or after the determining of the sets.
The clustering of sounds into the sets according to their respective sources may for example be done by a trained classifier.
Optionally, the training of the classifier may only be performed when a number of undetermined RRSs is too high, i.e. there are still many identified RRSs that neither have a high probability or a low probability of originating from the target patient. In such case it may be useful to perform a more computationally intensive classification operation.
According to an embodiment, the determination of the first subset comprises determining audio timestamps associated with the RRSs from the input audio recording and respiratory timestamps associated with the RRSs from the respiratory trace; and determining the first subset based on the audio and respiratory timestamps.
In other words, the audio timestamps indicate the occurrence of the respective RRSs in the input audio recording and the respiratory timestamps indicate the occurrence of the respective respiratory cycles of the target patient. As the RRSs of the target patient are related to the patient's respiration, the selection can be performed based on these determined timestamps. To this end, a timestamp may by characterized by any detectable time feature such as for example an onset, a local maximum or a local minimum. This way, the selection operation is reduced to first identifying the time features and then performing operations on these time features.
One operation may be to determine time differences between the audio timestamps and respective respiratory timestamps. As the RRS of the target patient is related to their respiration, the time differences that are associated with the patient will be rather constant, while the time differences associated with other sources will be more randomly spread.
By then determining a histogram of the time differences, the ones having a high probability of belonging to the target patient will be relatively more present in the peak of the histogram and the ones having a low probability will be relatively more present in the tails of histogram.
According to a second aspect, a controller is disclosed comprising at least one processor and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the controller to perform a method according to the first aspect.
According to a third aspect a computer program product is disclosed comprising computer-executable instructions for performing a method according to the first aspect when the program is run on a computer.
According to a fourth aspect a computer readable storage medium is disclosed comprising a computer program product according to the third aspect.
The method starts with obtaining an audio track 110 or audio recording 110 from which the RRSs 160 originating from the patient are to be identified or selected. The audio track is recorded within audible distance from the target patient, i.e. within the patient's sleeping environment. This may for example be done by placing an audio recording device next to the patient's bed or somewhere else in the patient's bedroom. An illustrative example of such audio recording is further shown in plot 111 where the amplitude 112 of the recorded audio signal is presented as a function of time.
From this audio recording 110, the different RRSs 131-134 are identified in step 120 of method 100. These identified RRSs may relate to one specific type of RRS, e.g. only snoring, or to several or even all possible RRSs. By the identification of the RRSs, other sounds or noises are excluded from the further steps, e.g. sounds from outside the room. An RRS may for example be identified by indicating its starting time, its ending time, and/or its time period allowing to uniquely identify it within the audio recording 110.
The identification of RRSs may for example be performed by executing one or more of the following steps:
The identified RRSs 130 do not necessarily all originate from the target patient. For example, some of them may originate from another person sleeping next to the patient or within the same room. Also, some RRSs may originate from animals, such as from a dog sleeping in the same room. Therefore, in a subsequent selection step 140, a subset 160 of the RRSs 130 is selected as originating from the monitored patient. To do so, a respiratory trace 150 from the patient is used to select the subset 160. Such a respiratory trace characterizes the breathing of the patient during the period of the audio recording 110. Plot 151 illustrates such a trace of the patient as function of time. The rising edges may then correspond to an inhalation and the falling edges to an exhalation, or the other way around. A respiratory trace may also correspond to discrete timestamps characterizing different breathing cycles. There is an observable temporal relationship between the trace 150 and the RRSs originating from the patient, while the other RRSs will not show such temporal relationship. Based on this the RRSs 160 originating from the patient are selected as output of step 140.
A respiratory trace may be obtained directly or derived indirectly from a measurement on the patient. For example, the trace may be derived from a signal obtained by a polysomnograph, an electrocardiograph, an electromyograph, a photoplethysmogram (PPG), or an accelerometer.
According to an embodiment, the selection 140 of RRSs 160 may be performed by the steps 200 as illustrated in
Another way of selecting the patient RRSs 160 is by calculating the coherence of one or more RRSs 130 with the respiratory trace 150, i.e. the degree of synchronization between the audio signal of the one or more RRSs and the respiratory signal during the same time interval. In this case, one or more RRSs with a high coherence are considered as having a high probability of originating from the patient and one or more RRSs with a low coherence are considered as having a low probability of originating from the patient, thereby again obtaining similar sets 210, 211, 212 of RRSs. Similar to the method of
The selection of RRSs from the patient by probabilities, e.g. by the steps of
Similar to
According to an embodiment, a further clustering step may be performed in the method 100 as illustrated in
A way of clustering 470 is to first determine a set of features characterizing the RRSs, for example Mel-frequency cepstral coefficients, MFCCs, the signal power within a specific frequency range, the temporal features such as the signal mean and standard deviation, features characterizing the entropy of the RRS, features characterizing the formant and pitch. Additionally, or complementary, RRSs occurring in a temporally repetitive pattern may be identified thereby obtaining different chains of RRSs. Then the RRSs are clustered into different plausible sources based on the association with the temporal chain and/or based on the similarities between the different derived features. Clustering based on features may for example be performed by clustering algorithms such as K-means clustering and Gaussian Mixture Model, GMM, clustering. Clustering based on the obtained temporal chains may for example be performed by identifying repetitive RRS patterns that have a specific time interval between occurrences. By the clustering, RRSs may still be left unassigned, i.e. not belong to a certain source by a high probability. In such case, a further supervised clustering step can be performed. A classifier is then trained to classify RRSs into clusters by using the already clustered RRSs as labelled training data. For the classifier, a support vector machine, SVM, or neural network may be used.
The so-obtained clusters of RRSs 471 are then used as input for the further selection step 440 in which clusters with a high and/or low probability of originating from the patient are identified. The cluster with high probability are then selected as output 160. Step 440 may be performed in the same way as step 140 or as step 200 but based on clusters of RRSs instead of individual RRSs. Further, an additional step 403 may be performed wherein yet unassigned clusters of RRSs are added to the output 160 in the same way as step 303 but based on clusters of RRSs instead of individual RRSs.
The steps according to the above described embodiments may be performed by any suitable computing circuitry, for example a mobile phone, a tablet, a desktop computer, a laptop and a local or remote server. The steps according to the above described embodiments may be performed on the same device as the audio recording device. To this end, the audio recording may also be performed by for example a mobile phone, a tablet, a desktop computer or a laptop. The steps according to the above described embodiments may also be performed by a suitable circuitry remote from the environment of the patient. In such case, the audio recording may be provided to the circuitry over a communication network such as the Internet or a private network.
As used in this application, the term “circuitry” may refer to one or more or all of the following:
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in a server, a cellular network device, or other computing or network device.
Although the present invention has been illustrated by reference to specific embodiments, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied with various changes and modifications without departing from the scope thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. In other words, it is contemplated to cover any and all modifications, variations or equivalents that fall within the scope of the basic underlying principles and whose essential attributes are claimed in this patent application. It will furthermore be understood by the reader of this patent application that the words “comprising” or “comprise” do not exclude other elements or steps, that the words “a” or “an” do not exclude a plurality, and that a single element, such as a computer system, a processor, or another integrated unit may fulfil the functions of several means recited in the claims. Any reference signs in the claims shall not be construed as limiting the respective claims concerned. The terms “first”, “second”, third”, “a”, “b”, “c”, and the like, when used in the description or in the claims are introduced to distinguish between similar elements or steps and are not necessarily describing a sequential or chronological order. Similarly, the terms “top”, “bottom”, “over”, “under”, and the like are introduced for descriptive purposes and not necessarily to denote relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances and embodiments of the invention are capable of operating according to the present invention in other sequences, or in orientations different from the one(s) described or illustrated above.
Number | Date | Country | Kind |
---|---|---|---|
20198192.5 | Sep 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/076160 | 9/23/2021 | WO |