The present invention relates to the field of features recognition in imaging and more specifically to the real-time detection of polyps within regions of the colon using machine learning techniques.
s Polyps are an abnormal growth of the tissue that may be benign or malignant. Early stage detection of polyps, independently of the risks they may represent is an important step towards both cancer prevention and fatality decrease. Colorectal cancer (CRC) is one of the main causes of death by cancer in the world, with an estimated incidence of 1.370.600 new cases in the world in 2012 and with a fatal outcome in 50% of cases. Today, up to 26% of polyps can be missed during optical colonoscopy sessions. This relatively high figure is partly due to circumstances in which images are acquired within colon regions, but also by the size of the polyps. A certain number of parameters are contributing to such high miss rate and they are as follows:
Furthermore, polyp detection is also plagued by some of the current methodologies necessitating high computational power and lengthy post detection analysis of acquired images. Last but not least, as of today, analysis of the acquired images has to be performed offline. Offline meaning the delay between instants of detection and analysis of the result is spanning from seconds to hours. This has repercussions on both the detection process and the result itself.
There are considerable numbers of techniques to detect polyps. In order to evaluate these techniques and compare them, some quantitative and qualitative metrics have been introduced. The metrics relate firstly to the capability of the technique to detect polyps and secondly how genuine these detections are. These metrics are as follows:
A few systems and methodologies are currently competing in order to ensure more reliable detection systems or techniques while sustaining a certain degree of consistency on whether a polyp has been detected or not. Two major paths are currently under investigation, the software and the hardware based approaches. Even though both approaches have their strengths and weaknesses, so far none of them succeeded to clearly prevail on the other one.
In the computer-assisted approach, the hardware aspect of the entire apparatus is less of a concern. In prior art, “Towards Automatic Polyp Detection with a Polyp Appearance Model”, J. Bernal et al., Pattern Recognition, 2012, vol. 45, no 9, p. 3166-3182, the software approach was used in order to detect polyps. The method is based on colonoscopy images where focus is on regions description as a function of depths of valleys. A three stage detection algorithm performed on captured images allowed a modeling of the appearance of a polyp. The method is using a region descriptor based depth of valleys, so called SA-DOVA (Sector Accumulation-Depth Of Valleys). The resulting algorithm is divided into three steps, region segmentation, region description and region classification with binary decision on SA-DOVA maxima. This method takes on average up to 19 seconds to process a single image. The method is to a certain extent limited, as there is a pre-selection of regions of the image where polyps are likely to be found. Other regions are not further processed, this could lead to miss some of the polyps.
The document WO2015/031641 is also a software based system and method for automatic polyp detection. The method teaches steps to perform automatic boundary detection of polyps and is parameterized to detect edges of polyps within previously acquired images. The detection is such as it is shape-independent and captures color variation across the boundary of polyps with Haar feature extraction. Haar features are defined as the intensity difference of the sum of pixels of areas inside neighboring rectangles for example. Haar features indicate the presence or not of certain characteristics in the image; such as for example a change of texture or color of neighboring rectangles where calculations are performed. Haar features tend to require substantial amount of computational power. To train the classifier, the method uses random forest, which is a regression algorithm built with 30 decision trees based on pixels features previously detected by a Canny's algorithm.
Even though the detection rate of the method is among state of the art results, it does appear to take a certain time to process an image. This is most likely due to all the computational power required to compute all the different algorithms.
The hardware approach may lean towards state of the art miniaturized devices that may be for example swallowed, such as a capsule endoscopy camera. Such miniaturized devices can be the size of a vitamin capsule embedded for example with a camera, wireless capabilities and the necessary sub-system to process images. The patient may then be equipped with a recording device capable of storing images sent from the capsule. From the time capsules begin their journey through the intestine until they get expelled out naturally of the body, they can take up to 50000 images. These images will be processed manually, which is a tedious task and increases chances of missing polyps. The document US2015/0065850 is a method of detecting polyps through images acquired with a wireless capsule endoscopy. The principle of the method is to prune a plurality of images acquired from the device to only keep those which are likely to contain polyps. Features of images containing polyp candidates are extracted before a regression step is applied to determine if the candidate is effectively a polyp. In this method, Local Binary Patterns (LBPs) are used to extract features of images. Specificities of LBPs are such as a targeted pixel is assigned a single bit weight according to its relative intensity regarding a pixel comprised in circular set of neighboring pixels. If the targeted pixel intensity is greater than its neighbor, then its assigned weight is 0 and 1 if otherwise. Assignation of binary weights is repeated for all the pixels of the circular set of neighboring pixels and until a binary word is formed. LBP is an effective texture based means of extracting local features of an image, yet it requires very little computational power. Even though the method takes as low as 0.83 second to process a single image, its true positive detection rate of 64.8% is not on par with its speed.
However, none of these documents provide a solution to implement a real-time detection method or system with comparatively low computational power—e.g. the one provided by a day-to-day use computer—while sustaining satisfactory true positive detection rate.
The present invention is introducing a fast, therefore real-time compliant detection method of polyps including means to acknowledge such detection. The method is software based and can be performed with a computational power similar to the one provided by a laptop. One of the aspect of the method is the extraction of local features, e.g. using LBPs, in order to take advantage of their fast and little computational power requirements. Another aspect is the use of a machine learning technique to train a classifier using a boosting algorithm, in order to identify polyps and learn from its mistakes. Once trained, the arbitrary named strong classifier is capable in tens of milliseconds to identify polyps features. Because of the assistance provided by the classifier in such method, skills of the clinician performing the detection are not as decisive as it may be in other techniques.
The method consists of first acquiring images in real-time within the colon region at a video stream frame rate. Then, only one channel color is selected for each individual image in order to reduce the computational time. The next step consist of scanning across the entire area of each image with a sliding sub-windows to extract for each position of the sliding sub-windows the local features of the colon region of interest. Next, these local features are passed through a classifier in order to determine whether a polyp is present or not for the scanned position of the sliding sub-windows. Finally, if a polyp is detected there will be a real-time framing on a display of the region where such polyp is detected.
This method has first of all the advantage of being non-invasive. This is becoming a more and more decisive criterion for both patients and medical staff during the choice of a technique to be used in colonoscopy. Because the detection method is fast, its result is rendered in real-time. Even though it is fast, it still yields high rate of true positive detection. Using the same database, some techniques of the art return similar figure in terms of true positive detection rate, but are several order of magnitude slower.
An object of the invention is then a method of performing real-time detection and displaying of polyps in optical colonoscopy, characterised in that it comprises the steps of:
A “local feature” is a numerical value, or set of values, defined by a function of a set of neighboring color pixels of the selected color channel (more precisely, of their luminance). The neighboring pixels are adjacent pixels chosen in a small area (whose exact size and shape depends on the local features considered) around a “central” pixel. For instance, a local feature may be a function of the central pixel and some or all of its first neighbors only (i.e. pixels in direct contact with the central pixel), or of the central pixel and some or all of its first and second neighbors only (second neighbors are pixels in direct contact with at list one first neighbor of the central pixel, except the central pixel itself and other first neighbors thereof), or of the central pixel and some or all of its first, second and third neighbors only (third neighbors are pixels in direct contact with at list one second neighbors of the central pixel, except first neighbors and other second neighbors). Advantageously, a number A>1 of local features are computed for each pixel of the sliding window. The overall number of local features computed for a n×m pixel sliding window is then A·m·n. A typical value for “A” is 10 or more. Considering a 60×60 pixel sliding windows, the number of local features computed for each position of the window may be of the order of 36.000.
Local features shall not be confused with global features. A global feature is a mean of a feature of all the single color pixels of the image, or of the sliding window, or of a “patch”, i.e. a set of the sliding window including more than a central pixel and its neighbors as defined above.
According to particular embodiments of the invention:
Another object of the invention is a system for real-time image detection and displaying of polyps comprising an input port (104) for receiving a video stream, an image processor (105) for processing images from said video stream and an output port (106) for outputting processed images (109), characterised in that the image processor is configured for:
a) acquiring and displaying a plurality of real-time images within colon regions to a video stream frame rate, each real-time image comprising a plurality of color channels;
b) selecting one single color channel per real-time image for obtaining single color pixels, each real-time image comprising a plurality of color channels;
c) scanning the single color pixels across each said real-time image with a sliding sub-window;
d) for each position of said sliding sub-window, extracting a plurality of single color pixels local features of the real-time image;
e) passing the extracted single color pixels local features of the real-time image through a classifier to determine if a polyp is present within the sliding sub-windows;
f) real-time framing (108) on display (102) of colon regions corresponding to positions of said sliding sub-window wherein polyps are detected.
According to particular embodiments of the invention:
A more complete understanding of the present disclosure thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings wherein:
While the present invention is susceptible to various modifications and alternative forms, specific example embodiments thereof have been shown in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific example embodiments is not intended to limit the disclosure to the particular forms disclosed herein, but on the contrary, this disclosure is to cover all modifications and equivalents as defined by the appended claims.
The video may be a High Definition (HD, 1440×1080 pixels) or a Standard Definition (SD, 720×480 pixels) video.
In
Table 1 below show the computational time required for obtaining a non-reinforced classifier (1st line) and a 1st, 2nd and 3rd reinforced classifier (2nd, 3rd, 4th line, respectively). These classifiers were created using a same computer (a 64-bits Windows with 32 Go of RAM). For each image of the training database, the researchers identified and isolated the position of one polyp (positive example) and also isolated 5 negative examples (without polyps, negative example). To test the ‘blue’ component, classifiers were first trained with 550 positive examples and 3000 negative examples. Then, considering the active learning reinforcement, the three different classifiers were trained using 6000, 7500 and 8500 negative examples, respectively.
Table 2 below is a comparison of the computation results of first classification performed in step s3 according to various embodiments of the present invention, which are for different visible single color channel selections.
From Table 2, one can see that the single color channel blue is not only capable of detecting correctly the highest number of polyps but it is also the one necessitating the shortest time on average to process one single image.
The present invention uses the same database as the previously cited prior art J. Bernal et al., therefore it is easier to compare performances of both techniques. In Table 3, best performances are overall obtained from the present invention. Results of the present invention are after three reinforcement of the classifier while using LBPs. These are shown and compared to the most up-to-date reports at the time of writing.
Both methods have approximately the same sensitivity, 89% for prior art of J. Bernal et al. and 86% for the method of the present invention. Even though both methods are on par in terms of sensitivity their F2 score is substantially different in favor of the prior art. But the difference is even greater if one looks at the average processing time per image showing a factor of nearly 550. Prior art is almost 550 times slower than the method of the present invention. Making object of the invention real-time compliant with satisfactory detection results.
Table 4 below shows the effect of the active Learning strategy on the performances (recall, precision, F2 score, average detection time) of the inventive method. It can be seen that this strategy significantly improve the is overall performances and particularly recall and F2-score.
The implementation of the real-time detection method is not limited to a computer system. In other embodiments, one can take advantage of the low computational power requirements and use GPUs (Graphics Processing Unit), FPGAs (Field Programmable Gate Array) or even integrated computer systems like RaspberryPis to implement such method.
The AdaBoost algorithm of the present invention is developed with OpenCV. Other embodiments may include the use of different means or database to develop the algorithm.
Other embodiments may also use a different boosting algorithm such as logitboost.
In one embodiment, the scanning is performed p times with p impair and greater than one, each time using a different size of the sliding window. The classifier is then applied to all n scans in order to decide whether a polyp is detected or not, e.g. through a majority vote (in this case, a polyp is considered present if it is detected at least p+1/2 times).
When using videos, e.g. at a typical rate of 25 frames/sec., rather than sets of still images, a significant improvement of the performances can be obtained through a “spatio-temporal coherence processing” stage. The idea is to improve the polyp detection rate and stability by combining “present” information, provided by the current frame, and “past” information, provided by previous frames showing a same region of the colon.
This approach is a spatial block fusion strategy to reduce the amount of candidates provided by selecting as final candidate ROI only those in which there was a higher degree of overlapping out of all the candidate boxes initially provided by the method. The spatial block fusion is applied to some successive images of a video to confirm or not the detection of a polyp by the method of the invention.
More precisely, the final sub-windows identified as polyps are defined as ROIs (Region of Interests). If multiple ROIs are located on the same regions of the image, a fusion strategy is used. This strategy consists in merging of the ROIs sufficiently overlapping (e.g. by 50% or more of their surfaces) and a final ROIfinal is generated. These ROIs within the images are defined as final ROIfinal.
According to this approach, a polyp detection is confirmed at time ti—corresponding to the “ith” frame—if and only if the polyp has been detected, at a same location, in at least two among the “(i−2)th”, “(i−1)th” and “ith” frames. On
Majority voting can also be performed on more than three frames, and other spatio-temporal coherence processing method may be applied to the invention.
In other terms, the method comprises in the step e):
X and y being non-zero numbers.
When the image If is the final image of the serie, the method still performs completely real-time detection because the calculation realized in the method of the invention takes less than 30 ms allowing the displaying of the Region of Interest ROIdisplayed in the same time than the image which takes 40 ms to appear in a video (with 25 images/second).
In another embodiment, for instance, the image If can be in the middle of the serie of images. For instance if n=5, If is preceded by two images and is followed by two images.
Advantageously, x is equal to or superior to 50, and y is equal to and superior to 70. These values allows to have ROI that encompasses with some precision the polyp.
Advantageously, the ROIdisplayed is calculated for some series of n successive images, n being impair a natural integer superior or equal to 3, and a polyp is considered present in the ROIfinal if the polyp is detected at least (n+1)/2 times in the serie of the n images.
To sum up, the method takes into account three successive majority votes:
Table 5 below shows the results obtained using two different kind of local features for polyp detection—LBP and Haar-like features—with (STC) and without spatio-temporal coherence processing. Active learning was not used (“N0” suffix).
The following metrics were used to measure performances obtained by the inventive method on videos:
In Table 5, it can be noticed that for all the considered videos (18), the polyp was detected in a significant number of frames, leading to a PDR of 100%. The Mean Processing Time per frame is only of 24 ms using Haar-like features without spatio-temporal coherence and of 36 ms with it, which is fully compatible with a real-time use. It is also observable that the spatio temporal coherence leads, for both local features, to an improvement of the global performances in terms of Precision and Recall as well as of F1 score. The Reaction Time also increases using spatio-temporal coherence processing with a mean delay of 1.5 s, which, nevertheless, remains compatible with a clinical use.
Table 6 shows the results obtained using both spatio-temporal coherence processing and active learning. It can be seen that the combined use of active learning strategy and spatio-temporal coherence processing leads to a significant improvement of the overall performance in terms of Precision, Recall without altering the 100% Polyp Detection Rate. HaarN1 appears to be the best local features to use with a MPT of only 21 ms and a Reaction Time of only 1.1 s.
In table 6:
N0 represents no active reinforcement;
N1 represents one active reinforcement;
N2 represents two active reinforcements;
Number | Date | Country | Kind |
---|---|---|---|
16177835.2 | Jul 2016 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/064972 | 6/19/2017 | WO | 00 |