The present invention relates to an image processor, and more particularly, to an image processor capable of extracting a specific object from an image.
In recent years, researches of an image processor and an image processing method are heavily conducted for picking up a subject or a landscape by means of image inputting means such as a TV camera and a CCD camera, and subjecting the obtained dynamic image to image processing, thereby extracting a specific object, e.g., an object moving in an environment or the movement from the image (e.g., see patent documents 1 to 5).
In a field of an automobile, such an image processor is used for picking up a forward landscape by a CCD camera or the like mounted on a vehicle to extract a pedestrian or other vehicle from the dynamic image, thereby avoiding an accident such as collision against it (see the patent documents 1 to 3). In a filed of an artificial intelligent robot, the image processor is used in such a manner that the robot found another moving object while observing an in a filed of an automobile by means of a mounted camera, the moving object is measured, and action of the robot against the moving object is determined (see the patent document 4), and researches of the image processor are conducted and they are put into actual use.
To extract a specific object from the dynamic image, such an image processor employs an image processing method in which an input image is obtained by two CCD cameras which are separated from each other in the horizontal direction, the obtained input image is subjected to the image processing to extract a contour of the specific object, or an image processing method in which an optical flow is calculated from the input image to extract the specific object, or an image processing method in which the input image is checked against models registered in a database by pattern matching processing to extract a specific object.
However, these methods need much labor is usually required for constructing a processing program, and such processing program must be constructed for each of target specific object. Thus, image processing methods and image processors capable of easily constructing a processing program and obtaining a general processing program are desired.
In a field of an image processing of a static image, in recent years, there is proposed an image processing technique (ACTTIT) in which an input image I is subjected to image processing based on a processing program in which various image filters F are combined in tree structure as shown in
More specifically, in the image processing technique, a document input image I comprising printed characters and manually written characters is subjected to the image processing by a processing program as shown in
In the non-patent document 1, it is proposed to employ a technique of a genetic programming (GP, hereinafter) to automatically optimize the combination of the various image filters F. This automatic constructing method of the image conversion is called ACTIT (Automatic Construction of Tree-structural Image Transformations) hereinafter.
Patent Document 1: Japanese Patent Application Publication Laid-open No. H5-265547
Patent Document 2: Japanese Patent Application Publication Laid-open No. H10-11585
Patent Document 3: Japanese Patent Application Publication Laid-open No. 2002-83297
Patent Document 4: Japanese Patent Application Publication Laid-open No. 2001-84383
Patent Document 5: Japanese Patent Application Publication Laid-open No. H9-271014
Non-patent Document 1: written by Shinya AOKI and one other, “Automatic Construction of Tree-structural Image Transformations ACTIT of Tree-structural Image Transformation)”, the Institute of Image Information and Television Engineers research paper, the Institute of Image Information and Television Engineers, 1999, vol. 53, 6th issue, p. 888 to 894.
Hence, it is expected that the image processing technique described in the non-patent document 1 is applied to the object for extracting a specific object such as a moving object from the dynamic image.
However, this image processing technique is for subjecting a static image to the image processing as described above and more specifically, the same static image must be input repeatedly for the “input image I” of the processing program shown in
If an optical flow technique capable of extracting a moving direction or a moving distance of each point on a moving object from an image can be combined in the ACTIT method, it is expected that the optical flow technique effectively functions when an object which moves in the image can be extracted from a dynamic image, and it is expected that the precision of extraction is enhanced.
Hence, it is an object of the present invention to provide an image processor capable of expanding the image processing technique ACTIT so that the same image processing technique can be applied to a dynamic image, capable of extracting a specific object from the dynamic image based on a processing program comprising various image filters which are combined in a tree structure, and capable of extracting a specific object having time variation or displacement. It is also an object of the invention to provide a general image processor capable of easily obtaining such a processing program.
It is another object of the invention to provide an image processor capable of employing an optical flow technique in the ACTIT technique which can automatically optimize a processing program comprising various image filters which are combined in a tree structure to that a moving object can precisely be extracted from a dynamic image.
To solve the above problem, according to an invention described in claim 1, an image processor in which an image picked by an imaging apparatus is subjected to image processing to extract a specific object, the image processor comprising
an image processing section which subjects a plurality of images picked up by the imaging apparatus to the image processing based on a processing program in which image filters are combined in a tree structure, and which forms an output image from which the specific object is extracted, wherein
the plurality of images are a plurality kinds of images constituting a plurality of dynamic images picked up by the imaging apparatus at time intervals from each other.
According to the invention described in claim 1, a configuration of the processing program of the tree structure processed by the image processing section of the image processor does not have only the same static image as a terminal symbol unlike the conventional technique, but has a processing program of tree structure in which a plurality kinds of images are terminal symbols.
It is preferable that the image processor includes a processing program forming section for forming the processing program, the processing program forming section forms the processing program by genetic programming using the plurality kinds of images, a target image and a weight image.
The weight image is set such that a ratio of a weight of its extraction region and a weight of a non-extraction region becomes equal to a ratio of a reciprocal of an area ratio of the extraction region and the non-extraction region.
It is preferable that the processing program forming section forms the processing program using a plurality of learning sets comprising the plurality kinds of images, the target image and the weight image.
It is preferable that a fitness used for the genetic programming in the processing program forming section is calculated such that a value of the fitness is smaller as the number of nodes is greater.
It is preferable that the ratio of the number of nodes to the fitness is varied in accordance with the number of generations in the process of evolution in the genetic programming.
It is preferable that a value of a fitness used for genetic programming in the processing program forming section is greater as the number of nodes of a two-input image filter in the processing program is greater.
It is preferable that the ratio of the number of nodes of the two-input image filter to the fitness is varied in accordance with the number of generations in the process of evolution in the genetic programming.
It is preferable that the processing program is formed by combining a plurality of processing programs.
It is preferable that an output image is formed by non-linear superposition of processing by the plurality of processing programs.
It is preferable that a mask filter is included in the image filter.
It is preferable that the image processor includes a display section for displaying an image, and the output image formed by the processing program is displayed such that the output image is superposed on the input image displayed on the display section.
It is preferable that the image processing section subjects a plurality of images constituting a dynamic image picked up by the imaging apparatus and an optical flow image produced by these images to the image processing based on the processing program in which the image filters are combined in the tree structure.
It is preferable that the image processor includes a processing program forming section for forming the processing program, the processing program forming section outputs a processing program which is optimized by genetic programming using the plurality of images, the optical flow image, the target image and the weight image.
It is preferable that the image processing section respectively converts the plurality of images picked up by the imaging apparatus into images viewed from above in a pseudo manner.
It is preferable that the image processing section inputs the plurality of converted images and the optical flow image produced based on the plurality of converted images to the processing program.
It is preferable that the processing program forming section carries out learning by the genetic programming using the plurality of converted images, the optical flow image produced based on the plurality of converted images, the target image and the weight image, and outputs the optimized processing program.
It is preferable that the optical flow image is an image expressing information of size of calculated flow in terms of a gradation value.
It is preferable that the optical flow image is an image expressing information of direction of calculated flow in terms of a gradation value.
It is preferable that the flow in the optical flow image is a flow with respect to a moving plane of the imaging apparatus converted based on a moving state of the imaging apparatus.
It is preferable that in the optical flow image, a gradation value of a picture element portion where reliability of calculated flow is low is set to zero.
It is preferable that the plurality of images are respectively converted into a state where a vantage point is moved upward with respect to the plurality of imaged picked up by the imaging apparatus.
According to the invention described in claim 1, a configuration of the processing program of the tree structure processed by the image processing section of the image processor does not have only the same static image as a terminal symbol unlike the conventional technique, but has a processing program of tree structure in which a plurality kinds of images are terminal symbols. Therefore, the image processing technique of the conventional ACTIT (see the non-patent document 1) can be expanded, and the ACTIT technique can be applied also to dynamic images in which frames have different images.
Images which are simultaneously input are compared with each other and difference processing is carried out or logical product processing is carried out. With this, image processing in which factors such as positional deviation of a specific object between the images is taken into account can be carried out, and it is possible to extract a specific object having time variation or special displacement in the image.
If the image processor includes an image processing section which subjects a plurality of images constituting a dynamic image picked up by the imaging apparatus and an optical flow image produced from these images to the image processing in accordance with a processing program in which various image filters are combined in tree structure, and which forms an output image from which a specific object is extracted, the ACTIT technique capable of inputting the same static image to the tree structure processing program and effectively extracting the specific object from the image can be expanded such that a plurality of images which have substantially the same total structure in the images and the optical flow image produced from these images to the tree structure processing program, and the specific object is extracted from the image constituting the dynamic image.
Especially, as an image which is to be input to the processing program, the optical flow image produced from the plurality of images constituting the dynamic image is input. With this, a region on the image corresponding to the moving object indicative of peculiar flow in the dynamic image can be clearly pointed in the processing program. Thus, when a specific object extracted from the dynamic image is a moving object, the image processor of the present invention can reliably and precisely extract the moving object.
Embodiments of an image processor of the present invention will be explained with reference to the drawings.
In the embodiment, an image processor which is mounted on a vehicle and which extracts a pedestrian from a forward landscape image of a vehicle will be explained.
The image input section 2 includes an imaging apparatus 21 capable of converting a picked up image into an electric signal. A CCD camera using a solid-state image sensing device such as a charge-coupled device (CCD) is used as the imaging apparatus 21. In this embodiment, the imaging apparatus 21 of the image input section 2 is mounted on an inner side of a front glass near a room mirror of the vehicle (not shown) such that a front side of the imaging apparatus 21 can pick up an image. Like a normal television image, the imaging apparatus 21 picks up an image of a front of the vehicle every 1/30 seconds and sends the input image to the image processing section 3.
In this embodiment, a unit of the input image sent at constant time intervals is called one frame. That is, in this embodiment, input images of 30 frames are sent to the image processing section 3 from the image input section 2 for one second.
The display section 4 having a monitor and the memory are connected to the image processing section 3. The image processing section 3 sends, to the display section 4, an input image sent from the image input section 2, and displays the same on the monitor and at the same time, the image processing section 3 temporarily stores the input image in the memory 5 in succession.
A processing program in which various image filters are combined in tree structure is stored in the image processing section 3. The image processing section 3 carries out image processing in accordance with the processing program to form an output image.
Here, a structure of the processing program will be explained. As shown in
In this embodiment, as the input images t, t−1, an input image t of the current time t and input images t, t−1, . . . , t−k of each M frame before the input image t are input to the processing program, and k is set to 3 and M is set to 1. That is, as shown in
Here, values of k and M can be set appropriately. For example, if k is set to 2 and M is set to 3, total three images of input images t, t−1, t−2 and t−3 of the current time t and three frames before the current time t are read and input to the processing program as shown in
Although the general image filters F as shown in Table 1 are used in the processing program of the embodiment to enhance the calculation speed, it is also possible to add an image filter having a special function in accordance with a purpose.
The image processor 1 of the embodiment extracts a pedestrian as a specific object from an image of a landscape in front of the vehicle as described above, and the processing program also extracts a pedestrian from the input image t. That is, if input images t, t−1, t−2 and t−3 (see
In the embodiment, the output image O formed in this manner is superposed on the input image t displayed on the monitor of the display section 4 and is displayed. That is, the input image t sent from the image processing section 3 is displayed on the monitor of the display section 4 as described above, and the output image O formed by the processing program is superposed on the input image t and displayed as shown in
At that time, the output image O of the processing program can be subjected to image processing by a mask filter as shown in
The processing program may be constructed artificially and can be given to the image processing section 3. In the tree structure processing program as shown in
In this embodiment, the processing program forming section 6 connected to the image processing section 3 automatically form using the genetic programming technique.
The initial population producing means 61 produces a constant number q (100 populations in the case of the embodiment) of tree structure processing programs as shown in
As a rule for producing the processing programs at random, in this embodiment, the number of image filters F (i.e., non-terminal symbols) of nodes constituting the tree structure processing programs in the process of evolution until not only the initial population but also an optimized processing program BP are obtained is set such that the number at the maximum does not exceed 40. The image filters F are selected at random from the image filters shown in Table 1. A mask filter as shown in
As described above, in this embodiment, k is set to 3 and M is set to 1, an input image to be input to the processing program is arbitrarily selected from the input images t, t−1, t−2 and t−3 of continuous four frames which are picked up at time intervals of 1/30 seconds in reverse chronological order from the constant time t. It is unnecessary to use all of the four kinds of input-images t, t−1, t−2 and t−3 as the input images of the processing program, and a processing program which uses two kinds, i.e., the input image t and the input image t−2, or only the input image t−3 can be included in the initial population.
The fitness evaluating means 62 is connected to the initial population producing means 61, and initial population of each processing program produced by the initial population producing means 61 is sent to the fitness evaluating means 62.
In the fitness evaluating means 62, simulations for inputting input images t to t−3 for the respective processing programs to obtain output image O are carried out, the output image O obtained by the simulation and the target image T are compared with each other, and fitness E of each processing program is calculated based on the following equation (1):
Equation 1
N: number of learning sets
O: Output image
T: Target image
W: Weight image
Here, the target image T is an image that should be output by the optimized processing program BP. In this embodiment, the purpose of the processing program is to extract a pedestrian from an image of a landscape in front of a vehicle. Therefore, an image (see
The weight image W is an image in which a weight W for assigning weight to a distance |O−T| between the output image O and the target image T for each picture element is defined in each picture element, and the weight w for each picture element is appropriately determined depending upon a purpose of the processing program to be constructed. Usually, the weight W is set great in a picture element region where it is strongly required that the output image O and the target image T match each other, and the weight W is set small in a picture element region where it is not strongly required that the output image O and the target image T match each other.
Since the object of the embodiment is to extract a pedestrian and not to extract other things, it is strongly required that the output image O and the target image T match each other in both the extraction region EX and the non-extraction region NE of the target image T. However, if the weight w is set equally in all images, an area rate of a picture element region occupied by a pedestrian in the output image O (i.e., extraction region EX) becomes smaller (area ratio is 12: 256) than an area rate of the other picture element region (i.e., non-extraction region NE), and there is a possibility that the contribution of matching degree in the non-extraction region in the fitness evaluation becomes excessively large.
Therefore, in this embodiment, the weight image W becomes a similar image to the target image T (see
In the fitness evaluating means 62, fitness E of each processing program is calculated using the weight image W and the plurality kinds of input images t, t−1, t−2 and t−3 and in this embodiment, the simulation of the processing program is carried out using two or more sets of sets S (learning set S, hereinafter) comprising a combination of the weight image W, the input images t, t−1, t−2 and t−3 and the target image T.
That is, as shown in
The parent selecting means 63 is connected to the fitness evaluating means 62, and each processing program whose fitness E is calculated by the fitness evaluating means 62 is sent to the parent selecting means 63.
The parent selecting means 63 selects a processing program of 100 populations to be remained for next generation by a method of selection of roulette, selection of expected value, selection of ranking or selection of tournament based on the fitness E from each processing program, and increases the processing program. In this embodiment, the 100 populations are selected by selection of tournament, and the fitness E simultaneously carries out elite preservation of the maximum processing program.
The processing program of the 100 populations which is selected and increased by the parent selecting means 63 is sent to the cross means 64.
In the cross means 64, as shown in
In this embodiment, cross at one point as shown in FIGS. 13 is carried out in the cross means 64, but other cross such as cross at multiple points, uniform cross can also be employed.
A processing program that is a child program of 100 populations produced by the cross means 64 is sent to next mutation means 65.
In the mutation means 65, modification, insertion, deletion and the like of nodes are generated at a predetermined rate for each processing program. At that time, when the number of non-terminal symbols in the processing program exceeds 40 by the insertion of node, the insertion is not carried out, and modification of the terminal symbol (i.e., input image t and the like) and the non-terminal symbol (i.e., image filter F) is prohibited. Mutation such as translocation and superposition may be carried out and at that time, appropriate limitation is set.
The fitness evaluating means 66 is connected to the mutation means 65, and a processing program of 100 populations produced by the mutation means 65 is sent to the fitness evaluating means 66. The same processing as that of the fitness evaluating means 62 is carried out in the fitness evaluating means 66, first to third learning sets which are the same as those used in the fitness evaluating means 62 are used, a simulation is carried out for each processing program, and fitness E is calculated based on the equation (1).
The termination determining means 67 is connected to the fitness evaluating means 66. Each processing program whose fitness E is calculated by the fitness evaluating means 66, and a processing program which has the maximum fitness of older generation elite preserved by the parent selecting means 63 are sent to the termination determining means 67, and it is determined whether the formation of the processing program in the processing program forming section 6 is completed.
In this embodiment, the termination determining means 67 determines whether the number of generations of the process of evolution reaches a preset number Ge of termination generations, and if it is determined whether the number of generations reaches the number Ge of termination generation, the processing program BP having the maximum fitness E is output to the image processing section 3 as solution at that time and the formation of program is completed. If the termination determining means 67 determines that the number of generations does not reach the number Ge of termination generations, the termination determining means 67 sends each processing program to the parent selecting means 63, and the above-described processing procedure is repeated.
In addition to this, the termination determining means 67 determines whether there is a processing program whose fitness reaches a preset target fitness Eq in each processing program, and if there is a processing program whose fitness reaches the target fitness Eq, this processing program may be output to the image processing section 3 as solution. It is also possible to employ such a configuration that the termination determining means 67 stores the maximum value of the fitness of each processing program, and when the maximum value of the fitness is not varied even when a predetermined number of generation elapsed, i.e., when the maximum value of fitness stagnates, the procedure is completed in this generation, the processing program having the maximum fitness is output to the image processing section 3 as solution.
In the processing program forming section 6, a processing program BP which is optimized based on the above-described process of evolution is formed, and phenomenon so-called excessive learning is found in the obtained processing program BP in some cases. That is, if this case is applied to this embodiment, there is obtained in some cases such a processing program BP that a general pedestrian is not extracted, a pedestrian wearing white clothes is not extracted, and only a pedestrian wearing a thick color clothes is extracted.
To avoid such an excessive learning, in this embodiment, fitness E′ in which the excessive learning limitation is taken into account is calculated based on the following equation (2) from the fitness E calculated in the equation (1) in the fitness evaluation in the fitness evaluating means 62 and 66. Therefore, in this embodiment, the fitness E′ in which the excessive learning limitation is taken into account is compared and referred to in the parent selecting means 63 and the termination determining means 67.
Equation 2
E′E−a·n(node)+b·m(2 input_node) (2)
E′: fitness in which excessive learning limitation is taken into account
E: fitness calculated based on equation (1)
a, b: coefficients
n (node): number of nodes
m (2 input_node): number of nodes of two-input filter
Both the coefficients a and b are positive values. According to the equation (2), the fitness E′ in which the excessive learning limitation is taken into account is calculated such that the fitness E′ is smaller as the number n of nodes (Node) in the processing program is greater and the fitness E′ is greater as the number m of nodes (2 input-node) is greater.
The reason why the fitness E′ in which the excessive learning limitation is taken into account is constituted as described in the equation (2) is that an object to be extracted is more limited as the number of nodes of the tree structure processing program is greater, the state is prone to become the excessive learning state, and a general object (overall pedestrians in this embodiment) can be extracted more as the number of nodes is smaller, and the general versatility is enhanced.
If the fitness E′ becomes smaller as the number of nodes is greater, the rate of the two-input image filter in the tree structure of the processing program becomes smaller, and even if input of the four kinds of input images (i.e., input images t, t−1, t−2 and t−3) is permitted as the input image like the embodiment, the tendency that a processing program which only inputs of low kinds of input images is obtained in the actual case becomes stronger and thus, the fitness E′ becomes greater as the number of nodes of the two-input image filter is greater.
The coefficients a nd b respectively show a rate of the number of nodes to the fitness E′ and a rate of the number of nodes of the two-input image filter to the fitness E′. The coefficients a and b may be varied in accordance with the number of generations of the process of evolution of the genetic programming in the processing program forming section 6.
When the number of generations is small, if both the coefficients a and b take great values and take small values with generation, processing programs in which the number of nodes is high are prone to be culled (effect of a), and the possibility that a predetermined including much two-input image filters remains becomes high (effect of b). If both the coefficients a and b become great with generation on the contrary, the processing specialized for the learning set S obtained in the initial stage of evolution can be simplified in the latter half of evolution.
When the evolution is going and the maximum value of the fitness is stagnated, if the values of the coefficients a and b are changed artificially, the possibility that more optimized processing program BP can be obtained.
The processing program BP formed by the processing program forming section 6 in the above described manner is sent to the image processing section 3 as described above. In this embodiment, as shown in
As a combining method, a logic sum is obtained for each picture element corresponding to n-number of output images O obtained by the processing programs BP1 to BPn, and a binarized image can be made as an output image O of the large scale processing program. Alternatively, the mask filter shown in
In this embodiment, six processing programs BP obtained by genetic programming at the processing program forming section 6 are combined to constitute the large scale processing program. In this large scale processing program, noise is removed from the output image O, and red color is more strongly displayed in a picture element where an image is extract by more processing programs BP among picture elements of output image O. Therefore, an output result of an i-th processing program BPi in the picture element where there is an output image O is defined as dynamic image, and an output brightness value D in each picture element of the output image O is determined based on a non-linear superposition shown in the following equation (3).
Equation 3
In the case of the embodiment, n is set to 6 and p is set to 2. A threshold value K is a constant and is set to 127 in the embodiment. Values of p and K may arbitrarily be set. If the value of p is set greater, a picture element in which an image is extracted can be more emphasized and displayed.
Next, operation of the image processor 1 of the embodiment will be explained.
The image input section 2 (see
If the image processing section 3 receives the input image t from the image input section 2, the image processing section 3 sends the same to the display section 4 and displays the same on the monitor, and temporarily stores the input image t in the memory 5. At the same time, input images t, t−1, t−2 and t−3 stored in the memory 5 are read, the input images t, t−1, t−2 and t−3 are input to the processing program in which image filters F are combined in tree structure to form the output image O, and a red colored output image O is superposed on the input image t displayed as a monochrome image on the monitor of the display section 4 and is displayed.
As described above, the processing program may be constructed artificially, but the processing program can be previously formed by genetic programming in the processing program forming section 6.
The procedure for forming the processing program in the processing program forming section 6 is as described above. Here, one example of the processing program BP as solution formed by the genetic programming in the processing program forming section 6 is shown in
The processing program BP shown in
If the processing program BP formed in the processing program forming section 6 is seen, it is frequently observed that processing by a difference filter is carried out at early stage of processing of image filter F with respect to the input images t, t−1, t−2 and t−3. It is considered that this is because the purpose of the processing program of the embodiment is to take a picture of a forward landscape from a moving vehicle to extract a pedestrian who is moving or stopping from the image, and a pedestrian is extracted from the plurality of input images t, t−1, t−2 and t−3 of a time series in which a position of the pedestrian is gradually slightly varied.
Therefore, instead of inputting all of images as shown in
As described above, according to the image processor 1 of the embodiment, the plurality kinds of input images t, t−1, . . . , t−k of front landscape of the vehicle picked up at time intervals can be input to the processing program in which the image filters F are combined in tree structure. With this, a plurality of frames of dynamic image can be input to the processing program, and a dynamic image can be subjected to the image processing.
Frames of dynamic images (e.g., forward landscape of a vehicle) are compared by various image filters F such as difference filter constituting the processing program of tree structure, and image processing such as difference is received and with this, it is possible to effectively form an output image O in which a specific object (pedestrian in the case of this embodiment) causing variation and displacement in terms of time from the dynamic image.
By automatically forming the processing program BP by genetic programming in the processing program forming section 6, the processing program BP can easily be obtained. By changing the target image T and the weight image W, a specific object to be extracted can easily be changed. That is, unlike the conventional technique, it is possible to easily construct the processing program BP in the same procedure while using the above-described genetic programming method as it is only by changing the target image T and weight image W without manually constructing a processing program for extracting the specific object.
At that time, in forming the processing program BP, when learning is carried out while using only one learning set S comprising a combination of the input images t, t−1, . . . , t−k shown in
However, if a plurality of learning sets S comprising the combination of the input images t, t−1, . . . , t−k, the target image T and the weight image W are used for forming the processing program BP as in this embodiment, it is possible to avoid such a phenomenon. Further, it becomes possible to more reliably extract a person from a landscape that is not used in the learning set S in the genetic programming as shown in
If the plurality of processing programs obtained in this manner are combined to form a large scale processing program, such effects can further effectively be exhibited.
When a processing program is obtained by the genetic programming, as the number of image filters F (non-terminal symbols) constituting the processing program BP is increased, a searching space of a solution program is usually increased exponentially, and enormous search is required. However, if the plurality of processing programs BP formed by using different learning sets S are combined as in this embodiment, it is possible to obtain a general processing program BP capable of extracting a specific object more easily and reliably.
It is an object of the embodiment to take a picture of a forward landscape from a moving vehicle and to extract a pedestrian from the image. In addition to this, it is possible to extract a vehicle from the forward landscape, to extract a general moving object such as a vehicle and a pedestrian, or to extract a boundary between a roadway and a sidewalk. Further, they can be combined so that a boundary between a roadway and a sidewalk is extracted and a vehicle or pedestrian moving on a roadway therebetween is extracted.
Although the output image O is superposed on the input image t and they are displayed in this embodiment, if the image processor 1 of the embodiment and other device are combined, it is also possible to send a specific object extracted by the image processor 1 of the embodiment to the other device and monitor the same, or to measure a distance to the object by the other device.
By combining the image processor 1 of the embodiment and a distance measuring device, and by specifying a pedestrian by the image processor 1 of the embodiment and measuring a distance to the pedestrian by the distance measuring device, it is possible to give an alarm when approaching, and to control the running to avoid collision. The distance measuring device need not measure a distance to an object in the entire region in front of a vehicle, and this reduces a burden.
The image processor 1 of the embodiment can be mounted not only on a vehicle but also on an artificial intelligent robot. For example, the image processor 1 can be used for founding and measuring another moving object while observing an environment using a camera provided on the image processor 1, and determining action of the robot with respect thereto.
A second embodiment of the image processor of the invention will be explained with reference to the drawings. In the second and third embodiments, image processors which are mounted on a vehicle for extracting an oncoming vehicle from a landscape image in front of the vehicle will be described, but the present invention is not limited to such image processors.
Since the structure of the image input section 12 is the same as the image input section 2 of the first embodiment, explanation thereof will be omitted. An image signal which is output from an imaging apparatus 121 is converted into digital gradation value of 256 levels by A/D conversion in an A/D converter, and the signal is subjected to geometric correcting processing such as brightness correction, noise removal, affine conversion or the like. The same is applied also to the first embodiment.
The image processing section 13 subjects an input image t as shown in
That is, the image processing section 13 reads, from the memory 15 connected to the image processing section 13, a processing program BP comprising a tree structure shaped combination of various image filters F formed by the processing program forming section 16, and develops the same in the RAM, produces an input image t of the current time as shown in
The image processing section 13 carries out the image processing, sends, to the memory 15, the plurality of input images sent from the image input section 12 and stores the input images therein in succession. In this embodiment, the display section 14 having a monitor and the input section 17 having a keyboard and a mouse are connected to the image processing section 13.
A structure of the processing program will be explained. The processing program is a program comprising various image filters F combined in tree structure as shown in
In this embodiment, the image filters F are selected from image filters F which input one or two image data sets shown in the following Tables 2 and 3. The image filter F which inputs one image shown in Table 2 is one input image filter, and the image filter F which inputs two images are two-input image filter.
In the processing program of this embodiment also, like the selecting method of input images shown in
A plurality of different input images can be selected by another selecting method and these input images can be input to the processing program. A general image filter F as shown in Table 2 or 3 is used in the processing program of the embodiment for enhancing the calculation speed, but it is also possible to add an image filter having a special function depending upon purpose.
The image processing section 13 produces an optical flow image OF from the plurality of images in addition to the plurality of input images t, t−1 and t−2, and inputs the optical flow image OF to the processing program. In this embodiment, the image processing section 13 produces the optical flow image OF by block matching processing from the input image t and the input image t−1.
In the block matching processing, an input image t of the current time t which is a reference is divided into 4×4 picture element blocks, and the matching processing with the input image t−1 is carried out for each picture element block. As a matching processing method, various methods such as an SAD method, an SSD method and a normal correlation method are known, and it is possible to employ any of them.
The SAD method which is employed in this embodiment will be explained briefly. As shown in
Further, blocks of 4×4 picture elements having origin at coordinates (k, 1) are set on the input image t−1 which is an object of the matching processing, and i and j are taken in the same manner as that described above. With this, coordinates of the picture elements on the block are indicated as (k+1, 1+j).
According to such definition, a total sum of an absolute value of a difference between the brightness value Ati,j of of the picture element in the picture element block on the input image t, and a brightness value At−1k+i, 1+j of the corresponding picture element in the block on the input image t−1, i.e., a city block distance Ck,1 is indicated as the following equation (4).
Equation 4
According to the SAD method, an input image t−1 is raster scanned while deviating by one picture element by one picture element, and a block where the city block distance Ck,1 becomes minimum is defined as a block corresponding to a picture element block on the input image t. A flow vector F from a block on the input image t−1 to a picture element block on the input image t is defined as an optical flow in a picture element block on the input image t. The above-described operation is carried out for all of the picture elements in the input image t and with this, an optical flow image OF in which optical flow is calculated for each picture element block is produced.
In this embodiment, to shorten the calculation time of the city block distance Ck,1, the raster scanning is carried out only in a region of constant range including a picture element block on an input image projected on the input image t−1 instead of carrying out the raster scanning over the entire region of the input image t−1.
When a wall of a building and a road surface are taken and a difference in brightness value between 16 picture elements is small and characteristics are poor in the picture element on the input image t, since there are many blocks having brightness characteristics similar to that on the input image t−1, there is a possibility that the matching is in error even if a corresponding block is found in accordance with the equation (4).
Hence, in this embodiment, a flow vector F calculated by the matching processing is reliably only when the following two conditions are satisfied and this flow vector F is defined as an optical flow corresponding to a picture element block on the input image t. If one of the two conditions is not satisfied, it is determined that the reliability of flow is low and the optical flow of the picture element block is set to 0.
(Condition 1) The minimum value Cmin of the city block value Ck,1 of the picture element block is equal to or lower than a preset threshold value Ca.
(Condition 2) A difference between the maximum brightness value and the minimum brightness value of 16 picture elements constituting the picture element block is equal to or higher than a threshold value A′.
Therefore, in this embodiment, the image processing section 13 divides the input image t into picture element blocks and then, determines whether the picture element block satisfies the condition 2, and if the condition 2 is not satisfied, the image processing section 13 does not carry out the matching processing for the picture element block, and an optical flow of size 0 is allocated to the picture element block. When a picture element block satisfies the condition 2 but does not satisfy the condition 1, i.e., when the city block distance Ck,1 is greater than the threshold value Ca, the possibility that the matching is in error is high, the reliability is low and thus, an optical flow of size 0 is allocated to the picture element block.
In this embodiment, the optical flow image OF is produced as an image in which a size of the optical flow calculated for each picture element block, i.e., the length of the flow vector F is converted into a gradation value of each picture element block. A gradation value of a picture element block to which an optical flow of size 0 is allocated is 0.
As can be found from comparison between
The image processing section 13 inputs the optical flow image OF and the input images t, t−1 and t−2 produced in this manner to the processing program to form an output image O. When they are input to the processing program, to combine resolutions of the input images t, t−1 and t−2 and the optical flow image OF, one picture element block of the optical flow image OF may be processed as 16 picture elements having the same gradation values, but in this embodiment, to enhance the calculation speed in the processing program, the processing is carried out while using an image which is compressed such that the input images t, t−1 and t−2 are adjusted to resolution of the optical flow image OF.
As a result of image processing by the processing program in the image processing section 13, the output image O as shown in
In this embodiment, as shown in
When the input image t and the output image O are superposed on each other and displayed, the output image O of the processing program can be subjected to the image processing by the mask filter as shown in
The information of the obtained output image O is displayed on the monitor of the display section 14, or instead of displaying the output image O, it can be sent to a control device which controls a subject vehicle to automatically control the vehicle to avoid danger.
Next, the processing program forming section 16 of the image processor 11 of the embodiment will be explained. The processing program can be constructed artificially and can be used for the image processing in the image processing section 13. In-the tree structure processing program as shown in
In the embodiment, in the processing program forming section 16 connected to the image processing section 13, the processing program BP is automatically formed by the genetic programming technique. In this embodiment, the processing program BP is previously formed by the processing program forming section 16 and stored in the memory 15 before image processing which is carried out by the image processing section 13 at the same time as the shooting by the imaging apparatus 21.
The processing program forming section 16 of the embodiment has the same structure as that of processing program forming section 6 of the first embodiment and thus, explanation thereof will be omitted. In this embodiment, however, in the process of evolution until not only the initial population but also the optimized processing program is obtained, the image filters F of nodes constituting the tree structure processing program is selected at random from the image filters shown in Table 2 and 3, and there is a limitation that the terminal symbol, i.e., at least one of images to be input to the tree structure processing program is an optical flow image OF.
In the mutation means, modification thereof is prohibited even when the optical flow image OF is modified to input images t, t−1 and t−2 or the like by modification of the terminal symbol and the optical flow image OF is not included in the terminal symbol of the processing program.
An input image as shown in
In this embodiment also, the termination determining means determines whether the number of generations in the process of evolution reaches the number Ge of preset termination generation, and if it is determined that the number of generations in the process of evolution reaches the number Ge of preset termination generation, the processing program BP in which the fitness E is maximum is output to the image processing section 13 as solution, and the program forming operation is completed.
Here, generation of excessive learning is avoided in the same manner as that of the first embodiment.
The optimized processing program BP formed by the processing program forming section 16 is sent to the image processing section 13 and stored in the memory 15. One example of the processing program BP formed by the processing program forming section 16 is shown in
Next, operation of the image processor 11 of the embodiment will be explained.
In the image processor 11, first, the processing program BP is formed. In the forming stage of the processing program BP, an image of forward landscape of a subject vehicle which is a base of formation of the processing program is picked up as dynamic images by the imaging apparatus 21, i.e., as a plurality of frames picked up every 1/30 seconds, and they are stored in the memory 15 through the image processing section 13. In this embodiment, since an object to be extracted is an oncoming vehicle, dynamic images in which the oncoming vehicle is picked up are stored.
Next, the number of input images to be input to the processing program, a frame distance, i.e., the k and M are set. When k is set to 2 and M is set to 1 as in this embodiment, appropriate three continuous frames in which the oncoming vehicle that should become the input images t, t−1 and t−2 are selected from the frames stored in the memory 15 as shown in
A target image T and a weight image W as shown in
In this embodiment, the optimized processing program BP is produced using the plurality of learning sets S1 to S3. Therefore, the same second learning set S2 such as an input image ta at time ta before time t corresponding to the input image t, and the same third learning set S3 such as an input image tb at time tb after the time t are formed and input to the processing program forming section 16.
A set value q of the number of initial populations in the initial population producing means and the number Ge of termination generations in termination determining means 57 are input to the processing program forming section 16. The processing program forming section 16 produces initial population of the processing program using various image filters F shown in Tables 2 and 3, parent selection, cross, mutation and the like are caused during the process of evolution in the genetic programming, and the fitness is evaluated. In this manner, the optimized processing program BP as shown in
At the execution stage of the image processing using the processing program, the image processing section 13 first read the processing program BP from the memory 15 and develops the same in the RAM. If the image processing section 13 receives an input image t from the imaging apparatus 21, the image processing section 13 produces an optical flow image OF from the input image t and the input image t−1, and inputs the optical flow image OF and the input images t, t−1 and t−2 to the processing program BP.
At the same time, the image processing section 13 sends the input image t to the display section 14 to display the same on the monitor, and the input image t is temporarily stored in the memory 15. If the calculation of the processing program is completed and the output image O is output, the image processing section 13 sends a result thereof to the display section 14 and as shown in
According to the image processor 11 of the embodiment, like the first embodiment, the plurality of input images t, t−1, . . . , t−k in the dynamic image picked up at time intervals are input to the processing program in which the image filters F are combined in tree structure. With this, functions of the various image filters F such as difference filter constituting the tree structure processing program are effectively exhibited, and a specific object can effectively extracted from the dynamic image.
The present invention proves that the ACTIT technique using the tree structure processing program constituted such as to input the same static image as that of the conventional technique can be applied even when a dynamic image is used, and the ACTIT technique can be expanded to the extraction of a specific object from the dynamic image.
In addition, according to the image processor 11 of the embodiment, all of the effects of the image processor 1 of the first embodiment can be exhibited.
In addition to the plurality of landscape images picked up by the imaging apparatus as input images, the optical flow image OF produced by these images are input. With this, in a state where a region on an image corresponding to a moving object indicating a specific flow in the dynamic image is made clearer, it can be given to the processing program. Thus, when the specific object to be extracted from the dynamic image is a moving object, a processing program for extracting a specific object can reliably and precisely be formed.
In the processing program forming section 16, if a processing program BP is automatically formed by genetic programming, the processing program BP can easily be obtained. If the target image T and the weight image W are changed, a specific object to be extracted can easily be changed.
That is, unlike the conventional technique, it is unnecessary to manually construct a predetermined for extraction whenever the specific object is changed. The processing program BP can be constructed by the same procedure while using the method using the above-described genetic programming as it is only by changing the target image T and weight image W, inputting them to the processing program forming section 16, and by producing and inputting the optical flow image OF. Therefore, the processing program BP can easily be obtained and at the same time, a general image processor can be obtained.
The optical flow image OF is input as an input image of the processing program. With this, as compared with a case in which the optical flow image OF is not used as the input image as shown in
If a processing program for extracting an oncoming vehicle from a dynamic image as in this embodiment and a processing program formed for processing other object, such as a processing program for extracting a front vehicle or a processing program for extracting a pedestrian as in the first embodiment are combined, it is possible to obtain a large scale processing program capable of achieving a wider object.
The third embodiment is different from the second embodiment in that an image after the conversion processing is input as input images t, t−1, . . . , t−k to be input to the processing program, instead of inputting a picked up image of a front landscape of the subject vehicle picked up by the imaging apparatus 21 as described in the second embodiment as it is or inputting a compressed image.
Therefore, in this embodiment, the optical flow image is also produced based on the input images t, t−1 after the conversion processing and is input to the processing program. Images corresponding to the input images t, t−1 after the conversion processing are used as a target image T and a weight image W used when forming a processing program BP which is optimized using the genetic programming technique by the processing program forming section 16.
In this embodiment, an input image after the conversion processing is called a converted image. Structures of the image processor 11, the processing program forming section 16 and the tree structure processing program are the same as those of the second embodiment shown in
In this embodiment also, a case in which k is set to 2 and M is set to 1, i.e., a case in which an input image t at the current time t as shown in
In this embodiment, the image processing section 13 of the image processor 11 converts input images t, t−1 and t−2 sent from the imaging apparatus 121 into images as viewed from above in a pseudo manner, i.e., converts the images such that the vantage point is set upward. A principle of conversion will be explained below.
At that time, the following equation (5) is established.
H/D=g(Y−s)/f (5)
If the equation (5) is defined,
D=Hf/g(Y−s) (6)
is obtained.
That is, the input image t is used as a basic image for conversion, and D is obtained based on the equation (6) from the j coordinate Y of the point R in the input image t. With this, a distance D to the point R on the actual road surface can be obtained. Not only the distance D in the longitudinal direction as viewed from the distance D, i.e., from the subject vehicle, but also a distance in the lateral direction as viewed from the subject vehicle (distance d, hereinafter) can also be converted in the same manner.
A gradation value of a picture element indicative of a point R in the input image t is indicated on a d-D plane after conversion where a left upper end is an origin, a horizontal axis is a distance d in the lateral direction and a vertical axis is a distance D in the longitudinal direction. With this, a converted image t′ having a gradation value in each picture element in a state as viewed from above in the pseudo manner can be obtained.
This conversion in this case is carried out on the assumption that the road surface is horizontal and in the image shown on the input image t, everything is on the road surface even through it has a height in the actual case. In this manner, rough assumption is included in the conversion, and expression “pseudo manner” is used in this invention.
A result of processing of the converted image t′ as viewed from above in the pseudo manner is again converted in accordance with the following relation which is inverse conversion of the equation (6):
Y=s+Hf/gD (7)
With this, the image can completely be restored to a state where a front of the subject vehicle is picked up as in the input image t shown in
If an optical flow image OF′ is produced from the converted image t′ and a converted image t−1′ (not shown) in the same manner as that of the second embodiment, the optical flow image OF′ becomes an image as shown in
In the execution stage of the processing program in the image processing section 13, the image processing section 13 converts the input image t sent from the imaging apparatus 21 into the converted image t′, produces the optical flow image OF′ from the converted image t′ and the converted image t−1′ which has already been converted, and inputs the converted image t′, t−1′and t−2′ and the optical flow image OF′ to the processing program.
Then, it is converted in a reversed manner of the equation (6) wth respect to the output image O′ from the processing program, i.e., by
Y=s+Hf/gD . . . (7) which is obtained by deforming the equation (5), the output image O corresponding to the original input image t shown in
In the forming stage of the processing program BP, a target image T′ as show in
A result of processing by the processing program BP which is formed and optimized in this manner is as shown in
That is, if the converted images t′, t−1′, . . . , t−k′ and the optical flow image OF′ were input as input images of the processing program, it was confirmed that there was a secondary effect that the fitness E of the processing program BP in the genetic programming was swiftly enhanced as the number G of generations was increased as compared with a case in which the optical flow image was not used as the input image like the first embodiment (the lowest graph in the drawing) and a case in which the input images t, t−1, . . . , t−k and the optical flow image OF were input (second graph from below in the drawing).
According to the image processor 11 of the embodiment, as described above, the same effect as that of the second embodiment can be obtained.
As shown in
As can be seen from comparison between
It is conceived that this is because since the converted images t′, t−1′ and t−2′ and the optical flow image OF′ based on the converted image t′ which are converted into states as viewed from above in the pseudo manner are used, the moving object starts moving by a flow vector in the optical flow image OF′ extremely clearly.
That is, as compared with the optical flow image OF produced from input images t and t−1 obtained by picking up the forward landscape as in the second embodiment, in the optical flow image OF′ produced from the converted images t′ and t−1′ as viewed from above in the pseudo manner as in this embodiment, it is possible to clearly distinguish between a flow caused by a running state of a subject vehicle and a flow caused by motion of an object moving in front space of the imaging apparatus, and the moving object starts moving in the optical flow image OF′ clearly. Therefore, in this embodiment, it is possible to effectively and precisely extract with respect to extraction of a moving object especially from a dynamic image.
Hence, as a modification of this embodiment, in order to more clarify a flow difference between the moving object in the optical flow image OF′ and the stationary object, the flow vector F (see
More specifically, in this embodiment, the input image t is converted into a converted image t′ in a state as viewed from above in the pseudo manner as described above. At that time, a flow with respect to a road surface is added to the flow vector F of each picture element block of the optical flow image OF′ produced by the converted image t′ based on a moving state of the imaging apparatus 121, i.e., a running state of a subject vehicle on which the imaging apparatus 121 is mounted.
For example, if a forward landscape is picked up in a state in which the subject vehicle is running forward, a flow vector F is calculated in a state in which a downward flow having substantially equal magnitude is added to the optical flow image OF′. If a landscape is picked up in a state in which the subject vehicle is turning leftward, the flow vector F is calculated in a state in which a rightward flow is added to the optical flow image OF′.
Therefore, speed or yaw rate is measured by a vehicle speed sensor or a yaw rate sensor, a flow is calculated based on a running state of the subject vehicle based on the measured value, and as in a case of the stationary object shown in
The already calculated flow vector F is converted into the flow vector Fc for each picture element block. With this, it becomes possible to clearly distinguish between the moving object and the stationary object, and in this image, the modified optical flow image OF′ can reliably recognize the moving object. If such an optical flow image OF′ is used, the moving object can more precisely be extracted from a dynamic image.
As another modification of this embodiment, instead of producing the optical flow image OF′ by color-coding each picture element block in the light and dark manner with the gradation value corresponding to the magnitude of the flow vector F calculated as in this embodiment or converted flow vector Fc, it is also possible to produce the optical flow image OF′ in correspondence with information of direction of the calculated flow vector F or the converted flow vector Fc.
For example, attention is paid to a j component of the flow vector F calculated from the converted images t′ and t−1′ or the converted flow vector Fc, i.e., a vertical component in the optical flow image OF′, when the j component is 0 or lower, i.e., it is separated away from the subject vehicle, the gradation value of the picture element block is set to 0, and when the j component is a positive value, i.e., when it is opposed to the subject vehicle, the gradation value corresponding to the j component is allocated to the picture element block. With this, it is possible to obtain an optical flow image OF′ suitable for extracting an oncoming vehicle.
Especially, if attention is paid to the j component of the flow vector Fc converted to a flow with respect to the road surface, an oncoming vehicle can be extracted more clearly.
If the optical flow image OF′ has the gradation value only when the j component of the converted flow vector Fc has a negative value, this is suitable for extracting a forward vehicle. If attention is paid to i components of the flow vectors F and Fc, i.e., a lateral component in the optical flow image OF′, it is possible to obtain an optical flow image OF′ suitable for extracting a pedestrian crossing a road.
These modifications can be applied to the image processor 11 of the second embodiment.
All disclosure of Japanese Patent Application No. 2004-373452 filed on Dec. 24, 2004 and all disclosure of Japanese Patent Application No. 2005-101273 filed on Mar. 31, 2005 are incorporated in this application.
The image processor of the present invention is effective as an image processor to be mounted on various vehicles such as a passenger vehicle, a bus, a truck and a trailer or an artificial intelligent robot and a monitoring apparatus having a camera. The image processor of the invention is suitable for securing safety in operation for extracting a specific object from an image in a vehicle to avoid collision, to found other moving objects while observing an environment using the camera of the robot, to measure the moving object, to determine the action of the robot with respect to the moving object, or the monitoring apparatus having the camera finds a moving object and monitors the same and gives an alarm.
Number | Date | Country | Kind |
---|---|---|---|
2004-373452 | Dec 2004 | JP | national |
2005-101273 | Mar 2005 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP05/23595 | 12/22/2005 | WO | 1/22/2007 |