The present invention relates to a remote photoplethysmography device, a remote photoplethysmography method and a remote photoplethysmography system for registering a first image frame and a second image frame.
Vital signs of a person, for example the heart rate (HR), the respiration rate (RR) or the arterial blood oxygen saturation (SpO2), serve as indicators of the current state of a person and as powerful predictors of serious medical events. For this reason, vital signs are extensively monitored in inpatient and outpatient care settings, at home or in further health, leisure and fitness settings.
One way of measuring vital signs is plethysmography. Plethysmography generally refers to the measurement of volume changes of an organ or a body part and in particular to the detection of volume changes due to a cardio-vascular pulse wave traveling through the body of a subject with every heartbeat.
Photoplethysmography (PPG) is an optical measurement technique that evaluates a time-variant change of light reflectance or transmission of an area or volume of interest. PPG is based on the principle that blood absorbs light more than surrounding tissue, so variations in blood volume with every heart beat affect transmission or reflectance correspondingly. Besides information about the heart rate, a PPG waveform (also called PPG signal) can comprise information attributable to further physiological phenomena such as the respiration. By evaluating the transmittance and/or reflectivity at different wavelengths (typically red and infrared), the blood oxygen saturation can be determined.
Conventional pulse oximeters (also called contact PPG device herein) for measuring the heart rate and the (arterial) blood oxygen saturation of a subject are attached to the skin of the subject, for instance to a fingertip, earlobe or forehead. Therefore, they are referred to as ‘contact’ PPG devices. Although contact PPG is regarded as a basically non-invasive technique, contact PPG measurement is often experienced as being unpleasant and obtrusive, since the pulse oximeter is directly attached to the subject and any cables limit the freedom to move and might hinder a workflow.
Non-contact, remote PPG (rPPG) devices (also called camera-based device) for unobtrusive measurements have been proposed in the last decade. Remote PPG utilizes light sources or, in general radiation sources, disposed remotely from the subject of interest. Similarly, also a detector, e.g., a camera or a photo detector, can be disposed remotely from the subject of interest. Therefore, remote photoplethysmographic systems and devices are considered unobtrusive and well suited for medical as well as non-medical everyday applications. Remote PPG is e.g. disclosed in W. Wang, A. C. den Brinker, S. Stuijk, and G. de Haan, “Algorithmic principles of remote-PPG,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 7, pp. 1479-1491, 2017 and M. Van Gastel, S. Stuijk, and G. De Haan, “New principle for measuring arterial blood oxygenation, enabling motion-robust remote monitoring,” Scientific reports, vol. 6, p. 38609, 2016.
Using PPG technology, vital signs can be measured, which are revealed by minute light absorption changes in the skin caused by the pulsating blood volume, i.e. by periodic color changes of the human skin induced by the blood volume pulse. As this signal is very small and hidden in much larger variations due to illumination changes and motion, there is a general interest in improving the fundamentally low signal-to-noise ratio (SNR). Thus, such an improved PPG signal should be free from distortions such as body motions, light spectra variations, low skin pulsatility and/or non-skin pixels pollution, etc. There still are demanding situations, with severe motion, challenging environmental illumination conditions, or high required accuracy of the application, where an improved robustness and accuracy of the vital sign measurement devices and methods is required, particularly for the more critical healthcare applications.
Video Health Monitoring (Heart Rate, respiration rate, SpO2, actigraphy, delirium, etc.) is a promising emerging field. Its inherent unobtrusiveness has distinct advantages for patients with fragile skin, or in need of long-term vital signs monitoring, such as NICU patients, patients with extensive burns, or COPD patients who have to be monitored at home during sleep. In other settings such as in a general ward or emergency department, the comfort of contactless monitoring is still an attractive feature.
While a promising new field, many challenges have to be overcome. Designing the system to be robust to movements of the patient is currently one of the main challenges, particularly to enable application in the emergency department.
A means for providing such a robust system for vital signs extraction (e.g., pulse extraction) is, for example, a combination of multi-wavelength channels to eliminate (motion-) distortions from a measured signal. Therefore, a system may have a multi-spectral camera equipped with Bayer filters to measure the skin at different wavelengths. However, Bayer filters are commonly available as RGB filters but less available as near infrared (NIR-) filters, which NIR wavelengths are particularly advantageous for pulse extraction. Furthermore, multi-spectral camera systems are still considered to be expensive which may be unchanged in a mid-term perspective (i.e. 5 years) and therefore restricts their broad application.
Elsewise, in order to improve overall system flexibility, multiple monochrome cameras can be used having optical filters for predefined wavelengths or wavelength ranges. Such systems provide a cost effective alternative compared to multi-spectral camera systems and additionally allow a higher degree of freedom for wavelengths selection (e.g., 760 nm, 800 nm, 905 nm) compared to multi-spectral camera systems, since they are independent from the availability of, i.e., Bayer filters. Additionally, multiple monochrome camera systems can be applied with narrow band filters which enable SpO2 measurement. In contrast, multi-spectral camera systems usually have cross-talk over bands that do not allow SpO2 measurement.
Multiple monochrome camera systems commonly comprise two or more cameras spaced apart from one another viewing at a common region of interest. Since the two or more cameras are spaced apart, their optical paths are different when focusing the common region of interest. This leads to a displacement in position of the common region of interest when viewing at two image frames acquired from the two or more cameras, wherein the occurring phenomenon is commonly referred to as “parallax”. Since the significance of parallax on the measurement for vital signs extraction depends, among others, on focal length and a distance between the region of interest and the camera(s), it can still be considered as a significant challenge for the use of multiple monochrome cameras.
An existing solution for the reduction of parallax is image registration/alignment. Thereby, image registration commonly comprises two stages, wherein in a first stage, a calibration is performed (e.g., for the first few frames of recording) in order to estimate a linear transformation model (such as translation, Euclidean, affine or homography) between image frames of different cameras. In a second stage, the estimated model may be applied to subsequent image frames to register said image frames to a reference image frame.
US 2018/0262685 A1 discloses a system that addresses the problem of parallax occurring when shooting, e.g., a 360° video using multiple lenses cameras. The system uses stitching of image frames from multiple cameras. A warp transformation is applied to determine a displacement of pixels of a border region of an acquired area. Furthermore, spatial and/or temporal smoothing is applied and the warp transformation is determined at multiple spatial scales.
VAN GASTEL, Mark; STUIJK, Sander; DE HAAN, Gerard, “Motion robust remote-PPG in infrared”, IEEE Transactions on Biomedical Engineering, 2015, 62. Jg., Nr. 5, S. 1425-1433 discloses the feasibility of rPPG in the (near)-infrared spectrum, which broadens the scope of applications for rPPG.
ZHOU, Dongxiang; ZHANG, Hong, “Modified GMM background modeling and optical flow for detection of moving objects”, in 2005 IEEE international conference on systems, man and cybernetics IEEE, 2005. S. 2224-2229 discloses detection of moving objects in a noisy background.
GAY-BELLILE, Vincent, et al. “Image registration by combining thin-plate splines with a 3D morphable model”, in 2006 International Conference on Image Processing. IEEE, 2006. S. 1069-1072 discloses an image deformation model combining thin plate splines with 3D entities, a 3D control mesh and a camera.
It is an object of the present invention to provide a remote photoplethysmography device, a remote photoplethysmography method and a remote photoplethysmography system that reduce the impact of parallax on the registration of image frames.
In a first aspect of the present invention a remote photoplethysmography device for registering a first image frame acquired by a first imaging unit and a second image frame acquired by a second imaging unit is presented, both the first and the second image frames depicting a common region of interest, the remote photoplethysmography device comprising a processing unit configured to measure a first pixel displacement between the first image frame and the second image frame, to correct the first pixel displacement according to spatial and/or temporal geometric constraints between the first imaging unit and the second imaging unit, and to register the first image frame with the second image frame based on the corrected first pixel displacement.
In a further aspect of the present invention, a remote photoplethysmography system is presented which comprises a first imaging unit configured to acquire a first image frame, a second imaging unit spaced apart from the first imaging unit and configured to acquire a second image frame, and a device for registering the first image frame and the second image frame according to the present invention.
In yet further aspects of the present invention, there are provided a corresponding remote photoplethysmography method, a computer program which comprises program code means for causing a computer to perform the steps of the method disclosed herein when said computer program is carried out on a computer as well as a non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method disclosed herein to be performed.
Preferred embodiments of the invention are defined in the dependent claims. It shall be understood that the claimed method, system, computer program and medium have similar and/or identical preferred embodiments as the claimed system, in particular as defined in the dependent claims and as disclosed herein.
The present invention is based on the idea of using a non-linear adaptive image registration to solve the above-mentioned problems of conventional image registration.
According to the present invention, the processing unit involves measuring a displacement of pixels or a group of pixels between, for example, two different image frames preferably acquired by two different imaging units (i.e. cameras) at the same time, wherein the two image frames depict the same object or region of interest, respectively. The measured pixel displacement is preferably used to interpolate image frames in order to have the same alignment. Furthermore, spatial and/or temporal constraints of different image frames are used to restrict the measurement of pixel-to-pixel displacement or a displacement between a predefined group of pixels in order to smooth the interpolation. Thus, correction of the measured pixel displacement can either be based on solely spatial constraints, solely temporal constraints or both spatial and temporal (spatio-temporal) constraints jointly. Preferably, in a case where temporal constraints are exploited, a time window size may be defined. Thereby, the term “time window size” may be understood as the time window length for a buffer of images. Thus, when a temporal image sequence is used to estimate the temporal constraints of a model, temporal image sequence may be set to at least two images in time (t→t+1), but may be alternatively set to a couple of images in time (t→t+N).
The present invention has the advantage and strength that it allows non-linear registration between different imaging units. Such non-linear registration is adaptive to video content and can therefore be used to improve registration of image frames acquiring an object in motion. Thereby, the image registration can be applied for scenes with depth changes or subject/object movements in an acquired scene (e.g., changes of a distance of the object to be measured to a respective imaging unit).
Thus, the present invention shows a clear advantage in comparison to transformation based image registration (as e.g. disclosed in US 2018/0262685 A1), since transformation based image registration is linear and therefore has clear limitations when a scene with clear depth information and/or objects with 3D geometries is to be measured, as it is only effective for 2D planes.
Furthermore, transformation based image registration uses estimation of a registration model which is not adaptive to video contents, since such a registration model is only valid for a subject (e.g., face) with the fixed distance-to-camera used for the model estimation. If the subject moves and changes its distance-to-camera during the measurement, the transformation based registration results are erroneous and may introduce, for example, color gradients and/or artifacts to vital signs extraction signal considered to be harmful for health signal extraction, especially for extraction methods such as blood volume pulse vector (PBV) method (as e.g. disclosed in G. de Haan, A. van Leest, “Improved motion robustness of remote-PPG by using the blood volume pulse signature”, Physiol. Meas., vol. 35, no. 9, pp. 1913-1922, October 2014), Chrominance-based method (as e.g. disclosed in G. de Haan and V. Jeanne, “Robust Pulse Rate from Chrominance-Based rPPG,” in IEEE Transactions on Biomedical Engineering, vol. 60, no. 10, pp. 2878-2886, October 2013) and plane-orthogonal-to-skin (POS) method (as e.g. disclosed in W. Wang, A. C. den Brinker, S. Stuijk, and G. de Haan, “Algorithmic principles of remote-PPG,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 7, pp. 1479-1491, 2017). In particular, measurement of blood oxygen saturation depends on the amplitude of color changes and is therefore highly sensitive to registration artifact. With the present invention, this can be avoided, since the registration model is adaptive, i.e. to video content.
Especially in a setup with large parallax between imaging units, clear scene depth, and/or object motions (where distance between object and imaging unit varies in time), the non-linear and adaptive image registration is considered to be advantageous.
Furthermore, image registration according to the present invention can effectively reduce color artifacts normally caused by imperfect image registration that are well known for linear-based image registration.
In addition, in health monitoring systems where an integrated solution (e.g., having a single optical path) is not common and/or too expensive, the present invention is considered to be beneficial, since a health monitoring system comprising the device according to the present invention and multiple imaging units, preferably multi-spectral NIR cameras, can be used as an effective alternative, since quality of image registration between the multiple imaging units can be significantly improved. Thus, robustness/accuracy of vital signs extraction, especially of SpO2 monitoring that preferably uses multiple NIR wavelengths for calibration, can be improved.
In a further embodiment of the device according to the present invention, the first image frame and the second image frame are acquired at a same point in time. Thereby, it is not only preferable that the first and the second image frames used for registration are acquired by the two imaging units at the same time, but also that the two imaging units are synchronized to one another.
In a yet further embodiment of the device according to the present invention, the processing unit is configured to measure, as the first pixel displacement, a pixel-to-pixel displacement between pixels or a displacement between a group of pixels inside the region of interest. The measurement of the pixel displacement is preferably done by firstly selecting the first or the second image frame as a reference image frame. Secondly, a displacement of each pixel or a predefined group of pixels between the reference image frame and the non-reference image frame is measured. Especially in cases where accuracy requirements are reduced, e.g. due to redundant information of measurement data, measurement of displacement of a group of pixels inside the region of interest instead of pixelwise displacement can show advantages, since computing time can be reduced and performance of the registration can be increased.
In a further embodiment of the device according to the present invention, the processing unit is configured to measure the first pixel displacement based on a dense optical flow acquired for each individual pixel inside the region of interest or for a group of pixels inside the region of interest. The term “dense optic flow” refers to a pattern of an apparent motion of an object, a surfaces and/or edges between two image frames, either acquired at the same time by two different imaging units spaced apart from each other (parallax), or acquired at two different times by one imaging unit (i.e. when the object is in motion). Furthermore, dense optical flow can be defined as a distribution of apparent velocities of movement of pixels or a group of pixels between two different image frames.
In a yet further embodiment of the device according to the present invention, the dense optical flow is based on one of the Lukas Kanade flow, the Farneback flow, the Horn-Schunck flow, the block-matching flow, the deep-nets flow and/or the 3DRS flow. Thereby, the main differences between these optical flow measurement methods are their accuracy and robustness in finding pixel-matches and pixel displacements, respectively, wherein higher accuracy leads to higher costs for the implementation of the respective dense optical flow measurement method. Furthermore, the different dense optical flow measurement methods differ in terms of efficiency (e.g. 3DRS is fast in computation). For example, some of the above-mentioned optical flow measurement methods are more applicable to plain regions in image frames having fewer textures; others (namely more advanced/recent dense optical flow measurement method) are more applicable to featureless image regions.
In a further embodiment of the device according to the present invention, the processing unit is further configured to analytically calculate a second pixel displacement based on the spatial geometric constraints and/or the temporal geometric constraints and to smooth the first pixel displacement by calculating a mean value of the first pixel displacement and the second pixel displacement. Thus, for example, the value of the first pixel displacement can be refined based on predetermined spatial and/or temporal constraints, resulting in improved image registration. In particular, imprecision of registered pixels or a group of pixels in the region of interest can be minimized which improves evaluation accuracy of health related parameters.
In a yet further embodiment of the device according to the present invention, the processing unit is further configured to analytically calculate a second pixel displacement based on the spatial geometric constraints and/or the temporal geometric constraints, to detect outliners in the measured first pixel displacement by comparing said first pixel displacement with the second pixel displacement and to correct the first pixel displacement by rejecting the detected outliners. According to this embodiment of the present invention, outliners resulting from imprecise registration, i.e. caused by parallax of subject motion, can be rejected and therefore do not cause measurement inaccuracy in the measurement of health parameters. In other words, the outliners that have been removed are not used in the analysis of the health parameters. Instead of measurement data related to the outliners, an average value (i.e. mean value) of pixel or group of pixels related measurement data of pixels of group of pixels adjacent to the rejected outliners may preferably be used for measurement data analysis.
In a further embodiment of the device according to the present invention, the processing unit is configured to downscale the first image frame and the second image frame. Thereby, the term “downscaling” refers to downsize the resolution of an image frame, for example resizing the image frame from 1240×720 pixels to 640×480 pixels. Preferably, downscaling is done by spatial averaging of pixels. Advantages of downscaling are noise reduction (i.e. reduction of a camera sensor noise) and a generation of a more stable pixel representation (e.g. super pixel). Downsizing can be considered as similar to a block-to-block (group of pixels) local registration, rather than to a pixel-to-pixel local registration.
In a further embodiment of the device according to the present invention, the processing unit is configured to upscale the first pixel displacement. Thus, the considered image frame is firstly downscaled (e.g. from 1240×720 pixels to 640×480 pixels). Afterwards displacement vectors of downsized pixels or a group of downsized pixels are estimated for the downsize image frame (640×480 pixels) and the estimated displacement vectors are upscaled by multiplying them with a ratio (from 640×480 pixels to 1240×720 pixels), respectively. Then, the image frame using upscaled vectors in the original resolution is registered. Especially, the step of downscaling improves the speed of dense optical flow measurement and thus enhances overall registration efficiency.
In a yet further embodiment of the device according to the present invention, the spatial geometric constraints are based on predetermined geometric constraints between the first imaging unit and the second imaging unit. The spatial constraints may be measured using intrinsic parameters of the imaging units (i.e. imaging unit position, view angle, distance, etc.) and/or based on a measurement of image content. Thereby, corner points or features represented in the image content may be used for referencing in order to derive spatial constraints. As an example, a view relation of a 3D object acquired by a multi-camera system having at least two cameras which are spaced apart from one another (and thus having a stereo vision on the 3D object) may be used as a spatial constraint. Instead, for example, a view relation of 3D motion in such a multi-camera system may be used as a temporal constraint.
In a further embodiment of the system according to the present invention, the first imaging unit included in the system is a monochrome camera and/or a multi-spectrum camera and the second imaging unit included in the system is a monochrome camera and/or a multi-spectrum camera. Especially in a case where SpO2 measurement is performed based on three different wavelengths, a system may comprise one monochrome camera and one multi-spectrum camera, wherein said cameras are spaced apart from one another. In such a case, the monochrome camera may be configured to acquire a first wavelength or wavelength range, whereas the multi-spectrum camera may be configured to acquire both a second wavelength or wavelength range and a third wavelength or wavelength range. Thus, three different wavelengths can be measured by two different cameras which lead to a reduction of cost and overall system's complexity.
In a further embodiment of the system according to the present invention, the first imaging unit included in the system is configured to acquire a first wavelength or wavelength range in the visible or infrared wavelength range and the second imaging unit included in the system is configured to acquire a second wavelength or wavelength range, different from the first wavelength or wavelength range, in the visible or infrared wavelength range. For example, both the first and the second wavelength or wavelength range may be a NIR wavelength or wavelength range.
In yet a further embodiment of the system according to the present invention, the system further comprising a health parameter extraction unit configured to extract vital signs of a subject based on the registered image frame. Thereby, the registered image frame obtained by the health parameter extraction unit results from the registration method according to the present invention which is performed by the device. With the use of the registered image frames for vital signs extraction, the impact of parallax on vital signs extraction can be reduced. According to the embodiment of the present invention, especially vital signs extraction from image frames of 3D subjects having depth information shows higher accuracy compared to state of the art vital signs extraction, i.e. based on transformation based image registration.
These and other embodiment of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter. In the following drawings
The system 100 further comprises a device 150 for registering the first image frame 120 acquired by the first imaging unit 110 and the second image frame 140 acquired by the second imaging unit 130. The imaging units 110, 130 may also be referred to as camera-based or remote PPG sensors. Both the first image frame 120 and the second image frame 140 depicting a common region of interest 160 of a subject 170. Both image frames 120, 140 include information used to determine physiological information indicative of at least one vital sign of the subject 170.
The subject 170 may be a patient, in this example a patient lying in a bed 180, e.g. in a hospital or other healthcare facility, but may also be a neonate or premature infant with very sensitive skin in NICU's, e.g. lying in an incubator, a patient with damaged (e.g. burned) skin or a person at home or in a different environment.
There exist different embodiments for a device for registering image frames acquired by different imaging units depicting a common region of interest of a subject's body, which may alternatively (which is preferred) or together be used. In the embodiment of the system 100, one exemplary embodiment of the device 150 is shown and will be explained below.
In one embodiment of the system 100, the first imaging unit 110 is a first camera and the second imaging unit is a second camera. Here, the first camera 110 is a monochrome camera and the second camera 130 is a multi-spectrum camera. In other embodiments, both the first and the second imaging unit 110, 130 may be a monochrome camera and/or a multi-spectrum camera. Preferably, the first imaging unit 110 (the first camera 112) is configured to acquire a first wavelength (such as red light at 700 nm) or wavelength range (such as red light from 680 nm to 720 nm) or infrared wavelength range (above 790 nm), whereas the second imaging unit 130 (the second camera 130) is configured to acquire a second wavelength (such as green light at 550 nm) or wavelength range (such as green light from 530 nm to 570 nm). The second wavelength or wavelength range is preferably different from the first wavelength or wavelength range.
In other embodiments, the system may comprise more than two imaging units spaced apart from one another. For example, according to a preferred embodiment, the system may further comprise a third imaging unit depicting the common region of interest 160 of a subject 170. The third imaging unit is preferably configured to acquire a third image frame. Preferably, the third imaging unit is configured to acquire a third wavelength or wavelength range which is preferably different from the first and the second wavelength or wavelength range.
Both the first camera 110 and the second camera 130 preferably include a suitable photosensor for (remotely and unobtrusively) capturing image frames (such as the first image frame 120 and the second image frame 140) of the region of interest 160 of the subject 170, in particular for acquiring a sequence of image frames of the subject 170 over time, from which photoplethysmography signals can be derived. The image frames captured by the cameras 110, 130 may particularly correspond to a video sequence captured by means of an analog or digital photosensor, e.g. in a (digital) camera. Such cameras 110, 130 usually includes a photosensor, such as a CMOS or CCD sensor, which may also operate in a specific spectral range (visible, IR) or provide information for different spectral ranges. The cameras 110, 130 may provide an analog or digital signal.
The image frames 120, 140 include a plurality of image pixels having associated pixel values. Particularly, the image frames 120, 140 include pixels representing light intensity values captured with different photosensitive elements of a photosensor. These photosensitive elements may be sensitive in a specific spectral range (i.e. representing a specific color or pseudo-color (in NIR)). The image frames 120, 140 include at least some image pixels being representative of a skin portion of the subject 170. Thereby, an image pixel may correspond to one photosensitive element of a photo-detector and its (analog or digital) output or may be determined based on a combination (e.g. through binning) of a plurality of the photosensitive elements.
In some embodiments, the system 100 may further comprise a light source (also called illumination source), such as a lamp, for illuminating the region of interest 160, such as the skin of the subject's 170 face (e.g. part of the cheek or forehead), with light, for instance in predetermined wavelengths or wavelength ranges (e.g. in the red, green and/or infrared wavelength range(s)). The light reflected from said region of interest 160 in response to said illumination may be detected by the cameras 110, 130. In another embodiment no dedicated light source is provided, but ambient light is used for illumination of the subject 170. From the reflected light, only light in a number of desired wavelength ranges (e.g. green and red or infrared light, or light in a sufficiently large wavelength range covering at least two wavelength channels) may be detected and/or evaluated by the cameras 110, 130. Therefore, the cameras 110, 130 may be applied with optical filters which are preferably different, though their filter bandwidth can be overlapping. It is sufficient if their wavelength-dependent transmission is different.
The device 150 according to one aspect of the invention comprises a processing unit 190 which may be a processor of a computational device, system-on-a-chip or any other suitable unit for data processing. The processing unit 190 according to the embodiment shown in
The processing unit 190 is configured to measure a first pixel displacement 200 between the first image frame 120 and the second image frame 140 and thereby executes step S100 of the method according to another aspect of the present invention (see
In order to monitor health related parameters based on the registered image frames, information included in the image frames, namely pixel based or group of pixels based color or grayscale information, may be extracted from image frames in time sequence. Therefore, a health parameter extraction unit 210 may be connected to the device 150 via one or more cables or wirelessly or may be integrated in the device 150. The health parameter extraction unit 210 may be preferably configured to extract one or more health related parameters from registered successive images frames (such as the image frames 120, 140).
A system 100 as illustrated in
In general, contactless monitoring may be more convenient than monitoring with contact sensors which is still used in a general ward or a triage in an emergency department. In addition, such contactless monitoring may be applicable for monitoring of automotive drivers as well as for sleep monitoring, wherein in the latter, especially NIR-based monitoring, preferably multi-spectral NIR-based monitoring, may be applied to improve robustness of vital signs extraction.
In the follow, the nonlinear adaptive image registration is explained in detail referring to
Typically the image frames acquired by the central camera is taken as reference image frames. In
D=DOF(Iref,Inonref),
where DOF(.) denotes the dense optical flow, Iref the reference image frame (i.e. image frame 120), Inonref the non-reference image frames (i.e. image frames 140, 230) and D the first pixel displacement 200, wherein D is used to correlate/interpolate the non-reference image frames 140, 230 in order to register them with the reference image frame 120:
I
reg=Interp(Inonref,D),
where Interp(.) denotes the interpolation/correlation and Ireg the registered image. The pixel-based interpolation is highly nonlinear for image transformation and dense optical flow measurement and interpolation are performed for each individual image frame. Thus, the registration is adaptive to video contents and robust to scenes having depth changes or object position changes (e.g., distance-to-camera changes) during monitoring.
Furthermore, according to the present invention, the first pixel displacement is corrected according to spatial and/or temporal geometric constraints. These constraints are applied as a post processing step of the “raw” dense optical flow results described above. Since the setup (e.g., camera position) is preferably fixed during the measurement, parallax-induced pixel displacement across the cameras 110, 130, 220 depends on predefined geometric relationships (e.g., epipolar geometry). Thereby, epipolar geometry considers, for example, the geometry of stereo vision, wherein two cameras view a 3D scene from two distinct positions. In such a setup, there are a number of geometric relations between 3D points and their projections onto the 2D image frame leading to constraints between these image points. These relations may be derived based on the assumption that cameras can be approximated by a pinhole camera model. Such relationships may be used as spatial geometric constraints to smooth the measurement of the first pixel displacement 200 or to restrict outliners. In
Based on the spatial geometric constraint(s), a second pixel displacement can be analytically calculated. Based on the second pixel displacement, the first pixel displacement 200 may preferably be smoothed by calculating a mean value of the first pixel displacement 200 and the second pixel displacement. In other embodiments, the second pixel displacement may be used to detect outliners in the measured first pixel displacement 200 by comparing said first pixel displacement 200 with the second pixel displacement and to correct the first pixel displacement 200 by rejecting the detected outliners.
In the example shown in
D
1→2 is the measured solution
D′
1→2
=D
1→3
−D
2→3 (analytic solution)
D″
1→2=(D1→2+D′1→2)/2 (smoothed solution)
where D1→2, D2→3, D1→3, denote the (group of) pixel displacement from camera 110 to 130, camera 130 to 220, and camera 110 to 220, respectively. Thereby, D1→2 is the solution measured by dense optical flow, D′1→2 is the analytic solution deducted from D1→3 and D2→3, and D″1→2 is the smoothed solution resulting from the mean value between D1→2 and D′1→2. It should be noted that it may be derived, i.e. from further metrics (not described here), that instead of the “smoothed solution” the “measured” or “analytic” solution may be more appropriate. Furthermore, it is possible to use D′1→2 to restrict measurement outliers in D1→2.
Similar to the use of spatial geometric constraints, temporal geometric constraints may be used to smooth the measurement of the first pixel displacement 200 (see
D
1→2 is the measured solution
D′
1→2
=T
2→1
−T
2→2 (analytic solution)
D″
1→2=(D1→2+D′1→2)/2 (smoothed solution)
where T2→1 denotes the pixel displacement from the second camera 130 at time t to the first camera 110 at time t+1 and T2→2 the pixel displacement from the first camera 110 at time t to the first camera 110 at time t+1.
Preferably, spatial and temporal geometric constraints are applied simultaneously (see
In
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive; the invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
A computer program may be stored/distributed on a suitable non-transitory medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
Any reference signs in the claims should not be construed as limiting the scope.
Number | Date | Country | Kind |
---|---|---|---|
19183802.8 | Jul 2019 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/068337 | 6/30/2020 | WO |