The present invention relates to the field of image processing and methodologies to estimate the target size of an object of known position on an image, especially on infrared imaging systems where there is generally a contrast difference between the target and the background.
It is known that there are systems that use methods and models to estimate the position and size of an object (or equivalently a “target”) on a still image or a video frame. Infrared search and track (IRST) systems, in which infrared images of a scene are acquired and converted into grayscale format, are good examples of such systems. The acquired images consist of a two dimensional array of pixel values which represent the infrared intensities at these locations. Currently there are methods to estimate a target's location and size using statistical approaches, for example by calculating the standard deviation of intensity values of the neighboring pixels around a center pixel and by successively decreasing this neighboring area until a target is detected. This statistical approach will require determination of an image histogram which is the distribution of pixels of different intensities. Such an approach may be considered reasonable; however in reality there should be other steps to make sure that the system doesn't get affected from concerns like the possible noise from the sensor, texture of the target or unexpected temperature changes in the case of infrared vision which makes the problem much more complex.
Moreover in some cases, the position of the target is known previously. For this reason, a search for the target is unnecessary and the primary objective is to determine the target size based on its pixel position. In such cases, the position of the target can be given by the user (e.g. by a pointing device) and the method will determine the target size by searching for a target inside different windows (of all possible sizes). This is a very slow process and is dependent to image resolution. In addition, a simple method to determine whether the system is being affected by noise/texture is required, which is usually done by a signal to noise ratio analysis and is also inefficient.
In another method, image gradient values can also be used to detect target position. This method is very susceptible to image noise and the focus of the method is to detect only the target position. In order to detect the target size, a method that calculates pixel value standard deviation in an area around a given target position should be used.
In order to solve the mentioned problems, methods that generate scale-spaces of images (the representation of images with different scales or magnifications) can be used. Scale-space based methods can handle issues caused by texture and/or noise and can detect the target size independent of the image resolution. There are existing methods that use image scale-space in order to determine salient features on a target in an image. However these methods clearly do not aim to estimate the target size and include complex convolution operations which are unnecessary for our purpose.
As a result, for a gray level image that includes a target object of given position, the current methods do not offer a simple and efficient way to estimate the size of a rectangle that encapsulates the whole target with minimum possible number of background pixels.
The United States patent document US20110001823, an application in the state of the art, discloses a method for detecting target location and size, using pixel standard deviation where the target size is sought by varying window size by one pixel at each iteration.
The present invention may resemble the mentioned method in the sense that it performs a target search using pixel standard deviation. However, unlike the mentioned method, the present invention generates a pixel standard deviation scale-space. In other words, while calculating the standard deviation in growing windows, the growing step size is increased and repeated for a number of iterations. Thus, the target size search is carried out in successively increasing step sizes. This allows the present invention to calculate the target size independent of the image resolution, unlike the mentioned method which varies window size by one pixel at each iteration.
The U.S. Pat. No. 7,430,303, an application in the state of the art, discloses a method for detecting a target in images using image gradient values.
The present invention differs from the mentioned method in the sense that the present invention does not use gradient values since they are very sensitive to image noise. Instead, the present invention uses pixel standard deviation as it is much more robust to image noise.
The U.S. Pat. No. 6,711,293, an application in the state of the art, discloses a method to detect salient features on images, in which difference of Gaussians are used to construct a scale-space.
The present invention resembles the mentioned method in the sense that it performs a target analysis using a scale-space constructed from the image. However the present invention generates a scale-space using standard pixel deviations, while the mentioned method generates the scale-space using difference of Gaussians. In addition, the output of the mentioned method is a list of salient points on the image, whereas the output of the present invention is the size of the target at a given pixel position.
An objective of the present invention is to provide a methodology to estimate the size of a target on an image received from a sensor, given the pixel position of the target.
An objective of the present invention is to provide a simple and efficient methodology to estimate the size of a rectangle that encapsulates the whole target with minimum possible number of background pixels using scale space of pixel standard deviations.
An objective of the present invention is to provide a methodology to estimate the target size independent of the resolution of the imaging system which is used to capture the target.
A system and method realized to fulfill the objective of the present invention is illustrated in the accompanying figures, in which:
The components illustrated in the figures are individually numbered, where the numbers refer to the following:
A method for estimating target size (100) fundamentally comprises the following steps,
First, a grey level image of the scene under consideration is received together with the coordinates of a pixel on the target on this image (101). Assuming that there is high contrast between the target and its background and low contrast among them independently; within an imaginary window that properly encloses the target, the calculated pixel standard deviation calculated for the pixels in the window should be close to a maximum value. In theory, for the pixel standard deviation values calculated for different sized windows around the given pixel, the target size is equal to the window size that gives the maximum pixel standard deviation. Therefore the size of the imaginary window around the target, for which the pixel standard deviation is maximum, can be assumed as a good estimate for the window size that encapsulates the whole target with minimum possible number of background pixels, i.e. the target size.
In the step following (101), pixel standard deviations around the given pixel on the target are found within growing rectangular windows using different window sizes. During this step, window size is increased in at least one direction, this direction(s) being the same direction(s) for successive enlargements. The enlargement in one direction is done by the related step-size, starting from a predetermined size. At the end, a window size versus standard deviation graph, for every desired window size having a height and width is obtained (102).
For example, to estimate the size of a square or a circular target, a square window can be enlarged in both directions by the same step size. For an irregular shape (say an ellipse), the enlargement can be done in separate directions independently. Therefore, it is possible to find the change of standard deviation for different directions, thus the size of a non-symmetric shape can also be calculated. The rectangular window can be arbitrarily rotated about the target pixel coordinate, but this rotation should remain the same throughout a set of standard deviation calculations in order to obtain the standard deviation versus window size graph. Nevertheless, at the end of step (102), if the image noise is not excessive and the target does not posses detailed texture (as it usually does not for infrared images), a standard deviation versus window size graph similar to one in Graph 1 is obtained with a maximum value.
The factors, which could prevent the method from working properly in an infrared image, could be excessive image noise, detailed target texture and abrupt changes of temperatures on the target. These conditions violate the assumption that there is high contrast between the target and background and low contrast among them independently. In this way, the problem remains unresolved since more than one maximum standard deviation value may exist in the final graph under these circumstances. However, the method aims at investigating the most general and significant distinction between the target and background. If the representative window could be analyzed on different scales, it would be possible to find a more general and better discrimination for the target. The window size limits are predetermined with a plausible assumption.
“Is the graph monotonically decreasing?” (103) check is applied to the graph to determine whether the graph has a maximum or not. If the answer is “no” then there should be a maximum value. If a maximum value exists in the graph, the window at the point where the standard deviation first starts to decrease (the nearest data point to the right of the maximum value on the graph) is recorded in step (104). Nearest data point to the right equivalently means the data point related to the next (larger) window size. If the graph is monotonically decreasing (i.e. answer is “yes”), then a check for a previously recorded window size (“is a previously recorded window size existent?”) is carried out in step (105). If there is a previously recorded window size, the window size that encapsulates the whole target with minimum possible number of background pixels is estimated as the last recorded window size in step (106).
As mentioned above, a maximum value (actually the point where the standard deviation first starts to decrease) prior to a monotonically decreasing graph is searched and recorded. During the search, the step-size (used to enlarge the window size for every standard deviation calculation) is increased at every iteration (108). In other words, to obtain different standard deviation versus window size graphs, the window size is enlarged with increasing levels of step-sizes. For example, if the windows sizes are 3×3; 5×5; 7×7 etc. for the first graph (step-size=2), they are 3×3; 7×7; 11×11 (step-size=4) for the second run. This is similar to down-sampling the image for every iteration and provides a way to use a scale-space of pixel standard deviations. This way, a more general view is obtained with successive iterations. When a monotonically decreasing graph is found, the previous window size with the maximum standard deviation will be a good estimate for the target size. In another application of the method (100), the initial window size, which is used to generate the standard deviation graph, is also increased proportional to the step-size in step (108) in order to make it compatible with the higher scale levels.
After the steps (104) and (105), (if there is no previously recorded window size in step (105)); “is maximum iteration limit exceeded?” check is carried out in step (107). Obviously, this check allows the method to iterate until a limit is reached. This prevents the method from iterating infinitely or too many times before a monotonically decreasing graph is found.
If a monotonically decreasing graph is not found before the limit is exceeded, there are two possibilities. If there is recorded window size (i.e. there has been a graph with a maximum, in a previous iteration), then in a further (106) step the target size is estimated as this recorded value. Or if there is no recorded window size (i.e. there hasn't been any graph with a maximum in a previous iteration) the target size is estimated as the predetermined minimum window size (109). (
If the first graph is monotonically decreasing (since it is generally not possible to have a maximum in the graphs of higher scale levels) the method (100) may iterate until the limit is reached. In such a case, the target size can be estimated as the predetermined minimum window size (109). In another application of the method (100), an additional step to check whether graph is monotonically decreasing at the first iteration, can be included in order to directly estimate target size as the predetermined minimum window size (109).
The reason why a monotonically decreasing graph is expected at higher scale levels, is related to the window size. If the updated window size at step (108) is large enough that it both encapsulates considerable amount of background pixels and the target; as the window size is increased further, the standard deviation decreases due to the fact that more and more background pixels will be involved in the calculation. Therefore, the exact scale (the iteration number of the graph) that gives target window size is the graph, which is succeeded by a monotonically decreasing one. Such a search algorithm prevents the method from being misled by lower scale levels (iterations with small step-sizes).
A system for estimating target size (1) fundamentally comprises;
In a preferred embodiment of the present invention, image sensor (2) is a sensor configured to acquire an image with a high contrast between a possible target and its background. Preferentially image sensor (2) is an infrared vision camera which is able to differentiate temperature differences in the scene.
In a preferred embodiment of the present invention, user interface device (4) consists of a pointing device such as a mouse, joystick or similar equipment and a device able to display a two dimensional pixel image. User interface device (4) is able to receive at least one coordinate from the user for every pixel image. The user will point the target and its position will be available to be used in the method (100) implemented on the image processing unit (3). The user can also re-target anytime to correct any mistakes in the size estimation.
In a preferred embodiment of the present invention, image processing unit (3) is configured to receive at least one image from image sensor (2) and receive at least one coordinate of a pixel on a target when available from user interface device (4). In another preferred embodiment, at least one coordinate of the target is received continuously (or automatically in predetermined intervals) from an automatic target searching and tracking device and image is received from a recorded or live image sequence (video stream). Anyone proficient in the image processing field should understand that this system (1) and method (100) can be applied on a sequence of pixel images and the target size can be continuously monitored. Since position of a pixel on the target is known, it is possible for user interface device (4) to print at least one rectangle, whose size was decided by the method (100) around the target onto the display.
The method (100) can effectively estimate the size of a target on a pixel image and display a “bounding box” around the target, given a position of a pixel on the target. A simple and efficient methodology used to estimate the size of a rectangle, which encapsulates the whole target with minimum possible number of background pixels using scale space of pixel standard deviations, is obtained. Moreover, the target size is estimated invariant of the of the visualization system's image resolution. The pixel resolution indicates how many pixels correspond to one meters of an object at a constant distance from the camera. When the imaging resolution of the visualization system changes, this value also changes. For instance, a method, which searches for a proper standard deviation value in a window increased by one pixel at each iteration, will function different in a system with different resolution. However in our case, if the pixel resolution of the visual system is changed, the graph with a peak succeeded by a monotonically decreasing graph will show up in a different iteration and giving a different rectangular target size (in pixels); however it will still encapsulate the whole target with minimum number of background pixels.
Within the scope of these basic concepts, it is possible to develop a wide variety of embodiments of the inventive system and method for estimating target size (1), (100). The invention cannot be limited to the examples described herein; it is essentially according to the claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2011/055618 | 12/12/2011 | WO | 00 | 9/22/2014 |