The present application relates to the accurate estimation of human heart rate and facial blood volume distribution by camera, based on image processing, deep learning and signal processing technologies.
Human heart rate and facial blood volume distribution are important indicators for human physiological health measurement. At present, the main ways to measure the human heart rate are ECG signal detection, photoelectric signal detection and other methods. The common characteristics of these methods is that the equipment for detection needs to cling to the skin of person being tested, to estimate heart rate from the skin potential activity. However, the limitation of this method is need to wear sensors, which limit the applicable scope of this kind of method. At present, remote detection of human physiological indicators through cameras has become a hot spot in current research. Because of the complexity of measurement environment of actual application scenarios, it is easy to cause interference to remote detection methods. In order to eliminate the interference mention above, signal decomposition methods such as wavelet decomposition, independent component analysis (ICA), principal component analysis (PCA) and Hilbert-Huang transform (HHT) are usually used alone or in combination to remove noises. The influence of noises cannot be eliminated well by a signal decomposition method, which is mainly due to the following two reasons: 1, a signal decomposition model is often a general decomposition algorithm, the prior information of physiological features of the human heart rate is not introduced; 2, the selection of decomposition results depends on subjective judgment, that is, the signal closest to heart rate features is selected from the decomposition results, which lacks an objective judgment standard. In order to improve the robustness and accuracy of the method of remote detection, the present application adopts deep learning technology, spectrum analysis and correlation calculation methods for detect the human heart rate and facial blood volume distribution, at the same time adopts Kalman filtering algorithm to fuse the results of heart rate by variety of ways, thus realizing accurate estimation of human heart rate.
In order to improve the detection efficiency of the human heart rate and facial blood volume distribution, the present application provides a new detection method and system for human heart rate and facial blood volume distribution. Based on facial data of the video which used to analysis and processes for realizing the estimation of the human heart rate and facial blood volume distribution by means of model prediction and signal processing.
The present application is realized by the following technical solution: a video-based method for accurately estimating human heart rate and facial blood volume distribution, including the following steps:
The present application based on video-based system for accurately estimating heart rate and facial blood volume distribution, including:
Compared with the prior art, the present application has the following advantages:
The present application will be further described in detail with reference to the attached drawings and specific embodiments.
(1.1) A convolution network model is used to detect the face and face key points in video frames, then face image sequence and face key position point sequence in time dimension are respectively extracted, as shown in Formula 1, where MTCNN( ) is the convolution network model, framei is the ith video frame image, facei is a face image extracted from the ith video frame, and critical_posi is the key position point of face image.
facei,critical_posi=MTCNN(framei) (1)
The form of the face image sequence is shown in Formula 2, where: face_seq is the face image sequence, facei is the face image corresponding to the ith video frame, and T is the length of video.
face_seq={face1,face2, . . . ,facei, . . . ,faceT} (2)
To facilitate the analysis of signal distribution, roi blocks with a size of R×R is used to segment face image in the sequence, then roi block sequences in time dimension are obtained, as shown in Formula 4, where: face_roii represents the ith roi block image sequence, and face_roi_seq is the set of roi block image sequences.
face_roi_seq={face_roi1,face_roi2, . . . ,face_roii, . . . ,face_roim×n} (4)
On this basis, each roi block image sequence is compressed, as shown in Formula 5, where: face_roi_seq is a set of roi block image sequences, PCompress ( ) is the compression function for calculating the average pixel intensity of a image, and face_roi_sig is the result of PCompress( ) function.
face_roi_sig=PCompress(face_roi_seq) (5)
where:
face_roi_sig={face_roi_sig1, . . . ,face_roi_sigi, . . . ,face_roi_sigm×n} (6)
In Formula 6, face_roi_sigi is the compressed signal corresponding to ith roi block, and m×n is the sum of face_roi_sig.
where T is the length of video, m×n is the sum of roi_sig_r.
In Formula 13, Volume( ) is the function for estimating blood volume distribution, and its specific form is shown in Formula 14.
In Formula 14, fsref is the frequency spectrum of the reference signal, fsroi is the frequency spectrum of the face roi signals, ⊗ is convolution operator, and m and n are the maximum value of roi blocks in horizontal and vertical coordinates.
The form of critical_posi is shown in formula 16, and k is the sum of face key position points, and i is i-th video frame.
critical_posi={pos1i,pos1i, . . . ,poski} (16)
Based on the key position points of face, the image sequence consisting of forehead, left and right cheek regions in time dimension is selected as the critical roi showed in
sig_ci=PCompress(img_ci) (17)
where:
sig_c={sig_c1,sig_c2, . . . ,sig_ci, . . . ,sig_cT}
In the above formula, sig_c is the result of compression of critical roi sequence, and T is video length.
On this basis, heart rate value is estimated as shown in Formula 22, where: heart_rate_pre is the heart rate value, mean( ) is the mean calculating function, and max_reg( ) is a function for searching for a heart rate range corresponding to the maximum probability.
heart_rate_pre=mean(max_reg(res_pro)) (22)
The Kalman filter model is shown in Formulas 23 and 24, where xk and zk are a predicted value and a measured value respectively, A and B are a state matrix and a control matrix respectively, H is transformation matrix from prediction space to measurement space, and wk-1 and vk are a prediction error and a measurement error respectively.
xk=Axk-1+Buk+wk-1 (23)
zk=Hxk+vk (24)
According to Formulas 25 and 26, the heart rate values estimated by the two measurement methods mentioned in step (2) and (3) are fused, where xk is the predicted heart rate value estimated in step (3), zk is the heart rate value estimated in step (2), K is the fusion coefficient, H represents the transformation matrix from a prediction space to a measurement space, and H=1 in heart rate measurement. Pk is a predicted variance, which corresponds to the predicted probability value in step (3). Rk is a measured variance, which corresponds to the signal-to-noise ratio of the reference signal in step (2.3).
The present application discloses a video-based system for accurately estimating human heart rate and facial blood volume distribution.
An image detection module is use for detecting human face region in the video frame and extract human face image sequence and key position points sequence of face in time dimension; extracting an global face signal and a set of face roi signals based on the face image sequence.
A preprocessing module preprocesses the global face signal and the roi signals extracted by the image detection module.
A frequency spectrum-based heart rate estimation module is used for estimating heart rate based on reference signal. Reference signal calculated in a linear weighting way based on the set of face roi signals, thus obtaining the heart rate value according to the extremum value of the frequency spectrum of reference signal, on this basis, the facial blood volume distribution can be calculated according to the frequency spectrum of the reference signal and the frequency spectrum of the roi signals.
A multimodal heart rate estimation model is constructed by LSTM and RCNN architecture which is used for estimating heart rate value based on heart rate distribution probability.
A fusion module is used for obtaining the fused heart rate value based on the results estimated by the frequency spectrum-based heart rate estimation module and the multimodal heart rate estimation model.
The above embodiments are only the preferred embodiments of the present application, and it should be pointed out that for a person skilled in the technical field, several improvements and variations can be made without departing from the technical principle of the present application, and these improvements and variations should also be regarded as the protection scope of the present application.
Number | Date | Country | Kind |
---|---|---|---|
202010448368.X | May 2020 | CN | national |
The present application is a continuation of International Application No. PCT/CN2021/080905, filed on Mar. 16, 2021, which claims priority to Chinese Application No. 202010448368.x, filed on May 25, 2020, the contents of both of which are incorporated herein by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
7756577 | Kroll et al. | Jul 2010 | B1 |
20160287181 | Han et al. | Oct 2016 | A1 |
20170367590 | Sebe et al. | Dec 2017 | A1 |
20210209388 | Ciftci | Jul 2021 | A1 |
Number | Date | Country |
---|---|---|
106845395 | Jun 2017 | CN |
107692997 | Feb 2018 | CN |
109460737 | Mar 2019 | CN |
109602412 | Apr 2019 | CN |
109700450 | May 2019 | CN |
110321781 | Oct 2019 | CN |
110458101 | Nov 2019 | CN |
111626182 | Sep 2020 | CN |
Entry |
---|
Wen Yang et al “Heart Rate Estimation from Facial Videos Based on Convolutional Neural Network”, IEEE 2018 (Year: 2018). |
International Search Report (PCT/CN2021/080905); Date of Mailing: Jun. 16, 2021. |
CN First Office Action(202010448368.X); Date of Mailing: Dec. 15, 2020. |
Notice Of Allowance(202010448368.X); Date of Mailing: Feb. 3, 2021. |
ECG Heartbeat Classification Based on ResNet and Bi-LSTM; Date of Mailing: Jan. 17, 2020. |
Number | Date | Country | |
---|---|---|---|
20220218218 A1 | Jul 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2021/080905 | Mar 2021 | WO |
Child | 17696909 | US |