Current state-of-the-art vital sign monitoring systems widely used in patient monitoring such as inpatient wards include sensors directly attached to the patient's body to measure respiratory and heart rates. Contact-based approaches cause inconvenience for patients due to the obtrusive nature of the measurement procedure. In the past few years, radar-operated non-contact monitoring systems have been proposed. However, radar systems are highly prone to electromagnetic interferences and multi-path effects, which hinders the discrimination of humans from other objects, especially in closed environments such as homes, offices, inpatient wards, and any enclosed space such as a spacecraft during long duration space flight.
Each patent, publication, and non-patent literature cited in the application is hereby incorporated by reference in its entirety as if each was incorporated by reference individually.
In some embodiments, the disclosure provides a system to estimate heart rate and/or respiratory rate of a subject comprising: a) an optical camera and a depth camera configured to receive optical and depth data of the subject in the field of view; b) a camera processing unit communicatively coupled with the optical camera and the depth camera and a radar processing unit, wherein the camera processing unit extracts at least one anatomical landmark from the body of the subject based at least upon the optical data; c) a radar configured to send and receive a radar signal; and d) a radar processing unit communicatively coupled with the radar and the camera processing unit to identify a point of interest on the subject's torso and extract heart rate and respiratory rate based at least upon the movement of the point of interest. In some embodiments, the disclosed system further comprises a database module to record at least the estimated heart rate and/or respiratory rate of the subject.
In some embodiments, the disclosure provides a computer-implemented method for estimating heart rate and/or respiratory rate of a subject, the method comprising: a) receiving optical and depth data of the subject in the field of view; b) extracting at least one anatomical landmark from the body of the subject based at least upon the optical data; c) sending a radar signal; d) receiving a radar signal and using the radar signal to identify a point of interest on the subject's torso; and e) extracting heart rate and respiratory rate based at least upon the movement of the point of interest.
Further objects, features and advantages of the present disclosure will become apparent from the following detailed description taken in conjunction with the accompanying figures showing illustrative embodiments of the present disclosure, in which:
Throughout the drawings, the same reference numeral, and characters, unless otherwise stated, are used to denote like features, elements, components, or portions of the illustrated embodiments. Moreover, while the present disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative embodiments and is not limited by the embodiments illustrated in the figures.
Any algorithm described herein can be embodied in software or set of computer-executable instructions capable of being run on a computing device or devices. The computing device or devices can include one or more processor (CPU) and a computer memory. The computer memory can be or include a non-transitory computer storage media such as RAM which stores the set of computer-executable (also known herein as computer readable) instructions (software) for instructing the processor(s) to carry out any of the algorithms, methods, or routines described in this disclosure. As used in the context of this disclosure, a non-transitory computer-readable medium (or media) can include any kind of computer memory, including magnetic storage media, optical storage media, nonvolatile memory storage media, and volatile memory. Non-limiting examples of non-transitory computer-readable storage media include floppy disks, magnetic tape, conventional hard disks, CD-ROM, DVD-ROM, BLU-RAY, Flash ROM, memory cards, optical drives, solid state drives, flash drives, erasable programmable read only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), non-volatile ROM, and RAM. The computer-readable instructions can be programmed in any suitable programming language, including JavaScript, C, C#, C++, Java, Python, Perl, Ruby, Swift, Visual Basic, and Objective C. Embodiments of the invention also include a non-transitory computer readable storage medium having any of the computer-executable instructions described herein.
A skilled artisan will further appreciate, in light of this disclosure, how the invention can be implemented, in addition to software and hardware, using one or more firmware. As such, embodiments of the invention can be implemented in a system which includes any combination of software, hardware, or firmware. In the context of this specification, the term “firmware” can include any software programmed onto the computing device, such as a device's nonvolatile memory. Thus, systems of the invention can also include, alternatively or in addition to the computer-executable instructions, various firmware modules configured to perform the algorithms of the invention.
According to embodiments, the computing device or devices can include a mainframe computer, web server, database server, desktop computer, laptop, tablet, netbook, notebook, personal digital assistant (PDA), gaming console, e-reader, smartphone, or smartwatch, which may include features such as a processor, memory, hard drive, graphics processing unit (GPU), and input/output devices such as display, keyboard, and mouse or trackpad (depending on the device). Embodiments can also provide a graphical user interface made available on one or more client computers. The graphical user interface can allow a user on a client computer remote access to the method or algorithm.
Additional embodiments of the invention can include a networked computer system for carrying out one or more methods of the invention. The computer system can include one or more computing devices which can include a processor for executing computer-executable instructions, one or more databases, a user interface, and a set of instructions (e.g., software) for carrying out one or more methods of the invention. According to other embodiments, the computing device or devices can be connected to a network through any suitable network protocol such as IP, TCP/IP, UDP, or ICMP, such as in a client-server configuration and one or more database servers. The network can use any suitable network protocol and can be any suitable wired or wireless network including any local area network, wide area network, Internet network, telecommunications network, Wi-Fi enabled network, or Bluetooth enabled network.
The camera processing unit analyzes RGB (or other grayscale or color images) and depth videos (
The optical and depth videos are used to identify one or more subjects in the field of view and estimate the body pose or human pose (i.e., identifying and classifying the location of the joints in the human body) and identify anatomical landmarks visible to the camera. We can use Cartesian coordinates (x,y,z) to locate the exact location of a point of interest (e.g., an anatomical landmark or body joint) in the physical three-dimensional space. In some embodiments x and y coordinates of the various anatomical landmarks are detected using the optical data. In some embodiments x and y and z coordinates of the various anatomical landmarks are detected using the depth data. In some embodiments, the depth sensor is used to find the 3rd physical coordinate (z) in space, where (x,y) coordinates are identified using optical or depth data.
In some embodiments, a machine learning classifier is used to identify the coordinates of the subject based on optical or depth frames. An example classifier to estimate human pose includes neural networks. In some embodiments, pose estimation frameworks described in Cao, Zhe, et al. “Realtime multi-person 2d pose estimation using part affinity fields.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017 and Xiao, Bin, Haiping Wu, and Yichen Wei. “Simple baselines for human pose estimation and tracking.” Proceedings of the European Conference on Computer Vision (ECCV). 2018 can be used.
The landmarks identified by the pose estimation algorithm are 2-dimensional. As described above, the 3rd dimension is identified using the one-to-one mapping of (x,y) to (x,y,z) coordinates intrinsic to the optical and depth cameras. The detected coordinates of the landmarks are used to identify one or more points of interest. In some embodiments, the point of interest is located on the torso of the subject (
The coordinates of the detected point of interest are used by the radar processing unit to extract movement information from signal received by radar.
The transmitted LFM signal is characterized by a chirp rate and an initial frequency which are denoted by β and α respectively, as mentioned below:
x
T(t)=ATexp(jπβt2+j2παt), (1)
where AT and xT(t) represent the amplitude and the transmitted chirp, respectively. Assuming a noise-less channel, xT(t) is reflected towards the receiving antenna with a round-trip time of T, and de-ramped (de-modulated) by mixing the signal with a replica of the transmitted signal at the receiver. The de-ramped signal can be approximated as follows:
x*
T(t)xR(t,T)≈ATAR exp(−j2πβTt−j2παT), (2)
where x*T(t), xR(t,T), and AR represent the complex-conjugate of the transmitted signal, the received signal which is delayed by T, and the received power, respectively. The de-ramped signal represents a single-tone waveform with a frequency of f0=βT and an initial phase of ϕ=αT. Hence, the Fourier transform of the mixed signal is called the range profile, which is equal to a Dirac delta function centered at f0 in the frequency domain with an initial phase ϕ, as mentioned below:
{x*T(t)xR(t, T)}=ATAR δ(f+βT)exp(−j2παT)=ATAR δ(f+f0)exp(−j2πϕ), (3)
where δ(.) and {.} show the Dirac delta function and the Fourier transform, respectively. Since small movements around the fixed point corresponding to the subject's distance are induced by cardiopulmonary vibrations, the round-trip time can be expressed as
with R and c representing the fixed distance from the subject's chest to the radar and the light speed, respectively. Furthermore, Δr(t) implies the small movements induced by the cardiopulmonary system movements. Substituting the expanded term of round-trip time into the estimated f0 and ϕ results in
respectively. Δr(t) can be neglected in f0 as it holds a significantly smaller value than R, and therefore does not shift the corresponding frequency range bin. Conversely, small chest movements are represented by ϕ0=αΔr(t)/C in ϕ, which is not negligible as it appears in the phase which represents the movement. As a result, the movement can be calculated by performing Fourier transform on the de-ramped signal, which results in a dominant peak at the range bin associated with the distance of the subject. The phase variation of this range bin along the successive beat signals (also known as the slow-time axis) amounts to the temporal evolution of the chest movement.
As mentioned earlier, the coordinates of one or more points of interest on the subject are provided by the camera processing unit to the radar processing unit. However, the center point of the camera and its corresponding axes are not aligned with those of the radar. In this invention, the radar and the camera sensors are assumed to be consolidated into an enclosure and spaced apart by a predefined distance. This setup allows for triangulating the coordinates acquired by the camera into the radar coordinate system. Assuming (x, y, z) are the coordinates of a point of interest on the subject provided by the camera processing unit, the corresponding coordinates from the radar viewpoint will be ({circumflex over (x)}, ŷ, {circumflex over (z)})=(x−x0, y−y0, z−z0), where x0, y0, and z0 denote the difference between the radar and camera center points that should be offset as shown in
where ϕ=arctan({circumflex over (x)}/{circumflex over (z)}) and θ =arctan(ŷ/), and λ=c/fc denotes the wavelength corresponding to the carrier frequency (fc) of the transmitted signal. Therefore, the steering vector matrix is given by v(ϕ, θ)=[e−j∂
As shown in
Y
(64×L)
=v
(64×1)
a
(1×L)
+N
(64×L), (5)
where a and v represent the radar signal and the corresponding directions of arrival, respectively, with the covariance matrix of noise being SN={NNH} and {.}H denoting the Hermitian operator. In some embodiments, the minimum-variance distortion-less response (MVDR) beam-steering technique described in Capon, Jack. “High-resolution frequency-wavenumber spectrum analysis.” Proceedings of the IEEE 57.8 (1969): 1408-1418 can be used as an exemplary method to coherently combine all 64 signals and enhance the signal quality in the subject's direction provided by the camera. The beam-steering vector can be calculated by
where SN−1 denotes the inverse covariance matrix of noise. The weights are multiplied by their corresponding antennas beat signals and a single de-ramped signal is produced. Applying fast Fourier transform (FFT) on the de-ramped signal leads to the range profile signal.
To obtain the time evolution of the chest movement, range profiles corresponding to successive de-ramped signals are arranged into a matrix with each row and column representing a range profile signal and a range bin, respectively. The constructed matrix is called a range-time profile consisting of slow time and range axes, as shown in
The coordinates provided by the camera correspond to the uppermost layer surface of the subject, which may include clothing. This is due to the fact that the depth sensor uses light with very low penetration into surfaces. In the case where the subject is wearing clothing, tracking the movement of the uppermost layer rather than the chest wall itself would reduce the accuracy since clothing may attenuate chest wall movements making such subtle movements more difficult to detect from uppermost layer surface movements. To determine the exact point representing the chest wall movement in a 3-D space, a search algorithm based on either signal power or the degrees of periodicity is used. The signal power algorithm compares energy levels of the signals corresponding to the range bins for higher distances than (but in the vicinity of) the subject's distance estimated by the camera. The signal with the highest energy level is selected as the source of movement, an example of which is illustrated in
Let's assume a single-tone periodic signal h[k], k=1, 2, 3, . . . , l, with a period duration of T and signal length of l. We can reshape the signal into an m×L matrix with each row representing a single cycle (L=T) as below:
Since h[k] is a periodic signal, the rows of the reshaped matrix H are dependent vectors, leading to a matrix of rank one. By performing singular value decomposition (SVD), H is decomposed into three matrices U(m×m), Σ(m×L), and VV(L×L) as H=UΣV*, where Σ represents a diagonal matrix including the singular values of H on the main diagonal as given below:
The number of non-zero elements in Σ equals the rank of H. Therefore, H consists of only one non-zero element, i.e., μ1, if it is a periodic signal. In this case, other elements, i.e., μ2,μ3, μ4, etc. converge to zero. As a result, the ratio of the largest to the second-largest singular values
converge to infinity. It should be noted that this phenomenon happens if and only if the length of the rows (L) in H equals the period duration of the signal, i.e., L=T. This formulation can be extended to quasi-periodic signals, such as the chest movements induced by the cardiopulmonary system activities. As for quasi-periodic signals, the ratio of the largest to the second-largest singular value is expected to be a large number given that the length of the rows is selected equal to the period of the signal. However, if the assumed period is different than the actual period of the signal, the rows of H will be independent, and thus, the ratio of the singular values will not be a large value. In this work, to locate the most representative point, the ratio μ1/μ2 is calculated for the range bins as selected for the method for signal power. For this purpose, the ratio is calculated in terms of the assumed periods for the respiratory (2-10 s) and the heartbeat (0.5-1.25 s) signals. The point resulting in the maximum ratio, i.e., revealing the highest degrees of periodicity, is selected as the desired point.
The phase of the column corresponding to the subject's range bin represents the chest movement. However, the corresponding range signal is often corrupted by direct current (DC) values of external interferences, i.e., ∈I and ∈Q, which are associated with in-phase (I) and quadrature (Q) signal components, respectively. These DC terms cause the movement signal to be distorted. Hence, a strategy for pre-empting phase distortion caused by DC is required. Given I(t)=ATAR cos(−j2πϕ)+∈1 and Q(t)=ATAR sin(−j2πϕ)+∈Q, the following term is achieved by reformulating the relationship between I(t) and Q(t) :
which defines a circular constellation centered at (∈I, ∈Q). To correct the constellation, the DC terms should be determined. To this end, an optimization problem is defined by rearranging (8):
min{|√{square root over ((l)(t)−∈I)2+(Q(t)−∈Q)2)}−ATAR|}, (9)
which is minimized through a gradient descent in terms of ∈I, ∈Q, and ATAR. As such, DC terms are offset by shifting and scaling the in-phase and quadrature components with respect to the optimum values of ∈I and ∈Q, and ATAR, respectively.
After DC compensation, the phase of the signal is calculated by arctan
which provides the movement information. An exemplary movement signal is shown in
To obtain the respiratory and heartbeat signals, the unwrapped phase is band-pass (BP) filtered within the ranges of 0.1-0.5 Hz and 0.8-2.5 Hz, respectively. The examples of the extracted respiratory and heartbeat signals are shown in
In some embodiments, the optical camera, depth camera, radar and a computational unit to perform signal processing and algorithmic computations to estimate heart rate and respiratory rate may be used in combination. This combination which may be packaged into one or more physical components is referred to as a camera-radar unit. In some embodiments, multiple camera-radar units may be used to monitor one or more subjects from different angles. However, two radars working in the same environment may create interference in the radar signals having a negative impact on the heart rate and respiratory rate estimation accuracy. To address this, the first camera-radar unit can communicate with a second camera-radar unit directly or indirectly through a third device to turn off the radar on the second camera-radar unit to eliminate radar signal interference.
To quantify the confidence level for the estimated respiratory and heart rates, a signal quality index is used. A signal quality index is a value representative of the reliability of the radar signal in order to extract heart rate and respiratory rate values that have acceptable accuracy. As shown in
where CRR, CHR, XRR, XHR,
can be used as a signal quality index to examine the confidence of the readings. As such, the readings (also referred to as estimations) are considered valid if the signal quality index holds a value larger than a pre-defined threshold (γ). In some embodiments, the estimated values of respiratory rate and/or heart rate are recorded in the database module if the signal quality index is larger than a pre-defined threshold.
In this disclosure, the descriptions of the various embodiments have been presented for purposes of illustration and are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The appended claims should be construed broadly, to include other variants and embodiments, which may be made by those skilled in the art upon review of the technology disclosed herein.
The present application claims priority to U.S. Provisional Application No. 63/285801 filed Dec. 3, 2021, titled “A Camera-Augmented FMCW Radar System for Cardiopulmonary System Monitoring,” which is hereby incorporated-by-reference in its entirety.
The invention was made with government support under 80NSSC20C0117 by NASA. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63285801 | Dec 2021 | US |