The embodiment discussed herein is directed to a breath detection device and a breath detection method.
In recent years, “sleep apnea”, which is cessation of breathing during sleep, is attracting attention, and it is hoped that a breathing state during sleep is detected accurately and easily. Conventional technologies for breath detection include a technology to perform frequency conversion of input voice of a subject and compare the magnitude of each frequency component with a threshold, thereby detecting sleeper's breathing, snoring, and a roaring sound, etc.
As another conventional technology for breath detection, there is a technology to collect sounds around a subject while the subject is sleeping and determine a period in which there is a sound as a period in which the subject is breathing. In this conventional technology, a cycle of appearance of periods in which there is a sound is detected as the pace of breathing, and, if there is no sound at timing of breathing, this period in which there is no sound is detected as an apnea period. These related-art examples are described, for example, in Japanese Laid-open Patent Publication No. 2007-289660, and Japanese Laid-open Patent Publication No. 2009-219713
However, the above-mentioned conventional technologies have a problem that it is not possible to detect a breath sound accurately.
In the technology to detect subject's breathing by comparing the magnitude of each frequency component with a fixed threshold, due to the influence of a noise around the subject, it may be incorrectly determined that the subject is breathing. Furthermore, in the technology to determine subject's breathing on the basis of whether there is a sound, it is based on the premise that sounds collected from the subject do not include any noises; therefore, it is not possible to detect a breath sound accurately in an environment in which noise occurs.
According to an aspect of an embodiment, a breath detection device includes a memory and a processor coupled to the memory. The processor executes a process including: first calculating a frequency spectrum that associates each frequency with signal strength with respect to the frequency, by dividing an input sound signal into multiple frames and performing frequency conversion of each of the frames; shifting a frequency spectrum of a given frame calculated to a frequency direction; second calculating a first similarity indicating how well-matched the before-shifted frequency spectrum and the after-shifted frequency spectrum are; third calculating a second similarity by finding cross-correlation between the frequency spectrum of the given frame and a frequency spectrum of a frame previous to the given frame; and determining whether the frequency spectrum of the given frame indicates breath on the basis of the first similarity and the second similarity.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. Incidentally, the present invention is not limited to the embodiment.
A configuration of the breath detection device according to the present embodiment is explained.
The input signal dividing unit 110 is a processing unit that divides an input signal into multiple frames. The input signal dividing unit 110 outputs the divided frames to the FFT processing unit 120 in chronological order. The input signal is, for example, a sound signal of a sound around a subject collected through a microphone.
The input signal dividing unit 110 divides an input signal into as many frames as the predetermined number N of samples. N is a natural number. The divided nth frame of the input signal is referred to as xn(t). Incidentally, it is provided that t=0, 1, . . . , N−1.
The FFT processing unit 120 is a processing unit that extracts which and how many frequency components an input signal contains, thereby calculating a frequency spectrum. The FFT processing unit 120 outputs the frequency spectrum to the harmonic-wave-structure estimating unit 130, the cross-correlation estimating unit 140, and the average-breath-spectrum estimating unit 160.
Here, a frequency spectrum of an input signal xn(t) is referred to as s(f), provided that f=0, 1, . . . , K−1. K denotes the number of FFT points. When a sampling frequency of input signal is 16 kHz, a value of K is, for example, 256.
When a real part is denoted by Re(f), and an imaginary part is denoted by Im(f), the frequency spectrum s(f) calculated by the FFT processing unit 120 can be expressed by equation (1).
s(f)=|Re(f)2+Im(f)2| (1)
The harmonic-wave-structure estimating unit 130 is a processing unit that finds autocorrelation of a frequency spectrum. The harmonic-wave-structure estimating unit 130 finds autocorrelation Acor(d) on the basis of equation (2).
In equation (2), d denotes a variable representing a delay. When a sampling frequency of input signal is 16 kHz, and the number of FFT points is 256, a value of a delay d is 6 to 20. The harmonic-wave-structure estimating unit 130 varies a value of d from 6 to 20 sequentially, and finds an autocorrelation Acor(d) with respect to each of the different delays d. The harmonic-wave-structure estimating unit 130 finds the maximum autocorrelation Acor(d1) in the autocorrelations Acor(d). Here, d1 denotes a delay resulting in the maximum autocorrelation. The harmonic-wave-structure estimating unit 130 outputs the autocorrelation Acor(d1) to the breath detecting unit 150.
A method to calculate an autocorrelation is explained.
Incidentally, the harmonic-wave-structure estimating unit 130 can find an autocorrelation on the basis of equation (3) instead of equation (2). By using equation (3), the influence of offset of the frequency spectrum s(f) can be eliminated. It is provided that s(−1)=0.
To return to the explanation of
In equation (4), save(f) denotes an average frequency spectrum of frequency spectra of previous frames containing a breath sound. The average frequency spectrum is hereinafter referred to as the average breath spectrum. The cross-correlation estimating unit 140 acquires the average breath spectrum save(f) from the average-breath-spectrum estimating unit 160.
When the same frequency spectral feature periodically appears as seen in breath, a value of cross-correlation is large. On the other hand, when the same frequency spectral feature does not periodically appear as seen in voice, a value of cross-correlation is small.
Incidentally, the cross-correlation estimating unit 140 can find a cross-correlation on the basis of equation (5) instead of equation (4). By using equation (5), the influence of offset of the frequency spectrum s(f) can be eliminated. It is provided that s(−1)=save(−1)=0.
The breath detecting unit 150 is a processing unit that determines whether a breath sound is contained in a current frame on the basis of the autocorrelation Acor(d1) and the cross-correlation Ccor(n).
The breath detecting unit 150 finds a determination threshold Th on the basis of equation (6). In equation (6), β is a constant, and is set to a value ranging from 1 to 10.
Th=β×Acor(d1) (6)
After finding the threshold Th, the breath detecting unit 150 compares a value of Ccor(n) with the threshold Th, and, when a value of Ccor(n) is larger than the threshold Th, determines that a breath sound is contained in the current frame. On the other hand, when a value of Ccor(n) is equal to or smaller than the threshold Th, the breath detecting unit 150 determines that a breath sound is not contained in the current frame.
When the breath detecting unit 150 has determined that a breath sound is contained in the current frame, the breath detecting unit 150 outputs the current frame to the average-breath-spectrum estimating unit 160.
The average-breath-spectrum estimating unit 160 is a processing unit that averages frames containing a breath sound, thereby calculating an average breath spectrum save(f). The average-breath-spectrum estimating unit 160 updates the average breath spectrum save(f) on the basis of equation (7), and outputs the updated average breath spectrum to the cross-correlation estimating unit 140. In equation (7), α is a constant, and is set to a value ranging from 0 to 1.
s
ave(f)=α·save(f)+(1−α)·s(f) (7)
Subsequently, a frequency spectrum of voice and a frequency spectrum of breath are explained by comparison.
In the frequency spectrum 5a of voice, frequency signals are irregularly generated. On the other hand, in the frequency spectrum 6a of breath, frequency signals are regularly generated. In the example illustrated in
Subsequently, autocorrelation of voice and autocorrelation of breath are explained by comparison.
In the autocorrelation 10a of voice, the maximum value of autocorrelation is 0.35. On the other hand, in the autocorrelation 10b of breath, the maximum value of autocorrelation is 0.2. Therefore, the maximum value of the autocorrelation 10a of voice is larger than the maximum value of the autocorrelation 10b of breath.
Subsequently, cross-correlation of voice and cross-correlation of breath are explained by comparison.
A threshold 12a of the cross-correlation 11a of voice is a threshold calculated on the basis of autocorrelation of voice. For example, when the maximum value of autocorrelation of voice is 0.35 and a value of p is 5.0, the threshold 12a is 1.75. As illustrated in
A threshold 12b of the cross-correlation 11b of breath is a threshold calculated on the basis of autocorrelation of breath. For example, when the maximum value of autocorrelation of breath is 0.20 and a value of p is 5.0, the threshold 12b is 1.00. As illustrated in
Subsequently, a procedure of a process performed by the breath detection device 100 is explained.
As illustrated in
The breath detection device 100 calculates cross-correlation (Step S105), and determines a threshold on the basis of the maximum value of the autocorrelation (Step S106). The breath detection device 100 compares the cross-correlation with the threshold, thereby detecting whether a breath sound is contained in the input signal (Step S107), and outputs a result of the detection (Step S108).
Subsequently, the effects of the breath detection device 100 according to the present embodiment are explained. When a breath sound is contained in an input signal, autocorrelation is small and cross-correlation is large. This characteristic is applied equally in a case where a noise is contained in the input signal. Therefore, without being affected by noise, the breath detection device 100 can accurately detect a frame containing a breath sound by determining whether a breath sound is contained in a frame on the basis of autocorrelation and cross-correlation of an input signal.
The breath detection device 100 according to the present embodiment finds an average breath spectrum by weighted-averaging frequency spectra of frames containing a breath sound, and finds cross-correlation between a frequency spectrum of a current frame and the average breath spectrum. Therefore, it is possible to eliminate error between frequency spectra of previous frames containing a breath sound and find cross-correlation accurately.
The breath detection device 100 according to the present embodiment compares a value of β times a value of autocorrelation with a value of cross-correlation, thereby determining whether a breath sound is contained in a current frame. By adjusting a value of β, whether a breath sound is contained in a current frame can be accurately determined in various environments.
Incidentally, components of the breath detection device 100 illustrated in
A breath detection device discussed herein can detect a breath sound accurately.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation of International Application No. PCT/JP2010/066959, filed on Sep. 29, 2010, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2010/066959 | Sep 2010 | US |
Child | 13780274 | US |