The present invention relates to image recognizing apparatus and method. More specifically, the invention relates to image recognizing apparatus and method for recognizing the behavior of mobile unit using images of external environment taken when the mobile unit is moving.
A method for detecting optical flow is well-known in the art for recognizing behavior of mobile unit using images of external environment by calculating, for example, gradient change of image density from sequence of input images.
In Japanese Patent Application Unexamined Publication (Kokai) No. 2000-171250, a method for detecting current position of a mobile unit by means of optical flow is disclosed. According to the method, optical flow of scenes of certain course are acquired every predetermined distance when the mobile unit moves along with the course beforehand. Then relationship between the optical flow and taking point is stored. After that, optical flow of the course is newly detected and undergoes matching with all of stored optical flows. The one showing maximum matching result is selected and a point associated with the selected one is recognized as current point where the mobile unit moves.
In Japanese Patent Application Unexamined Publication (Kokai) No. H11-134504, another method is disclosed. The method comprises calculating optical flow from moving images and processing the optical flow with neural network layer to recognize the behavior of the mobile unit. The method further comprises judging necessary processing based on the recognition. According to the method, detecting the approach to the obstacle is enabled with simple neural network based on moving images.
However, in order to implement the former method, many relationships between optical flow and point in a course need to be stored by moving the mobile unit in the predetermined course beforehand.
Generally, there are some problems to recognize position and behavior of the mobile unit by feature extracting based only on moving images like the latter method. For example, since relative distance between light source and a camera on the mobile unit changes continuously as the mobile unit moves, image intensity such as lightness changes so great that accurate feature extracting is difficult. In addition, since vibration of the moving mobile unit is transmitted to the camera, accuracy of feature extraction is degraded. If a recognizing system is configured to execute smoothing process over multiple frame of images to remove bad effects due to the variation of image intensity or the vibration, computing load becomes too heavy or feature extraction becomes difficult when the image moves fast due to large time variation.
It is objective of the present invention is to provide an image recognizing apparatus and method that can recognize the behavior of a mobile unit rapidly and accurately using images of external environment taken on the mobile unit in the real application.
An image recognizing apparatus according to the invention performs learning in an on-line like fashion in the real application without executing smoothing process to remove the noise effect like conventional art, and improve robustness over variation of environment by using such noise as data for feature extraction.
The image recognizing apparatus according to the invention comprises behavior command output means for outputting behavior commands to cause a mobile unit to move and local feature extraction means for extracting features of local areas within an image of external environment acquired on the mobile unit when the behavior command is output. The apparatus further comprises global feature extraction means for extracting features of global area of the image using the extracted features of the local areas and learning means for calculating probability models to recognize behavior of the mobile unit based on the extracted features of the global area.
The local feature extraction means extracts the features of the local areas within the image by utilizing image intensities obtained by applying both positive and negative component of Gabor filters on two images. Preferably, the Gabor filters are applied on one of eight different directions respectively.
The global feature extraction means combines the features of the local areas into the global feature with the use of Gaussian functions.
It is preferable that the probability models are generated utilizing expectation maximization algorithm and supervised learning with the use of neural network, but any other learning algorithm may be used.
After the probability models are generated, high-accurate behavior recognition of the mobile unit is enabled by applying the models to an image acquired afresh. Therefore, the apparatus may comprises a behavioral recognition means for applying Bayes' rule with use of the probability model on an image acquired afresh. The behavioral recognition means further calculates confidence for each of the behavior commands to recognize behavior of the mobile unit.
It is desirable that the accuracy of the behavior recognition is always greater than a certain level. Therefore, the apparatus according to the invention further comprises a behavioral assessment means for comparing the confidence with a predetermined value to assess the recognized behavior, an attention generation means for generating attentional demanding which demands to cause the probability model to be updated based on the result of the assessment, and an attentional modulation means for changing specified parameter of the global feature extraction means in response to the attentional demanding.
In this case, the learning means recalculates probability model after the parameter is changed. Then the behavior recognition means recognizes behavior of the mobile unit again with the probability models.
Other features and embodiments of the invention will be apparent for those skilled in the art by reading the following detailed description referring to the attached drawings.
Now some preferred embodiments of the invention will be described below with reference to the attached drawings.
The recognizing process of the image recognizing apparatus 10 comprises two parts. First part is advance learning part, where relationship is learned between behavior of a moving mobile unit and images taken by a camera equipped on the mobile unit. Second part is behavior recognizing part, where behavior of the mobile unit is recognized based on newly taken image using the knowledge learned in the advance learning part.
In the advance learning part, the behavior command output block 12, the local feature extraction block 16, the global feature extraction block 18, the learning block 20 and the memory 22 are employed. In the behavior recognizing part, in addition to those blocks, the behavior recognition block 24, the behavior assessment block 26, the attention generation block 28 and the attentional modulation block 30 are employed.
At first, each block employed in the advance learning part is described.
The behavior command output block 12 outputs behavior commands to a mobile unit 32. As used herein, term “behavior command” means a command to cause the mobile unit to go straight, to make right turn, or left turn. The behavior commands are output depending on instruction signal transmitted from external device. In alternative embodiment, the behavior command output block 12 may read pre-stored sequence of behavior commands and then output them. In a further alternative embodiment, the mobile unit 32 recognizes behavior of itself based on acquired image and determines which behavior to take next, and the behavior command output block 12 outputs the behavior command depending on the determination.
The behavior command is sent to mobile unit 32 by radio or cable broadcasting and causes it to move (for example, go straight or turn left/right). The behavior command is also supplied to global feature extraction block 18 and used for generation of global features described below.
Provided on the mobile unit 32 is an image acquisition block 14 such as a charge-coupled device (CCD) camera, which acquires image I(t) of external environment of the mobile unit 32 at time t every preset interval and supplies it to the local feature extraction block 16.
The local feature extraction block 16 extracts feature vector of each local area of the image I(t). As used herein, the term “local area” means one of small areas into which whole image I(t) acquired by the image acquisition block 14 is divided such that each small area has same dimension. Each local area is composed of plurality of pixels. In the present embodiment, the local feature extraction block 16 calculates optical flow from two images I(t), I(t+1) consecutive in time and used the optical flow to generate feature vector for each local area (hereinafter referred to as “local feature”). The extracted local features are supplied to the global feature extraction block 18.
The global feature extraction block 18 combines all of the local features of the image I(t) and extracts one new feature vector, which is referred to as “global feature” hereinafter. The global feature is supplied to the learning block 20.
The learning block 20 implements learning based on the global feature and generates probability models described below. In the present embodiment, well-known expectation maximization algorithm and supervised learning with the use of neural network are used for such learning. Alternatively, other learning algorithm may be used. The probability models generated by the learning are stored in the memory 22 and used for recognition of behavior of the mobile unit 32 in the behavior recognizing part.
After the advance learning part is finished, the image recognizing apparatus 10 provides functionality of recognizing the behavior of the mobile unit 32 accurately by applying the probability models on an image acquired afresh by the image acquisition block 14.
Now each block employed in the behavior recognizing part is described.
The image acquisition block 14 acquires image I(t) of external environment of the mobile unit 32 at time t for every preset interval as is noted above, and then supplies it to the behavior recognition block 24 this time. The behavior recognition block 24 applies the probability model stored in the memory 22 on the supplied image I(t). The block 24 then calculates “confidence” for each of behavior commands and recognizes the behavior of the mobile unit 32 based on the confidence.
The confidence is supplied to the behavioral assessment block 26, which calculates logarithmic likelihood of the confidence. If the logarithmic likelihood of the confidence is larger than a specified value, no more operation is generated. If the logarithmic likelihood of the confidence is equal to or less then the predetermined value, the attention generation block 28 generates attentional demanding signal and supplies it to the attentional modulation block 30.
Upon receiving the attentional demanding, the attentional modulation block 30 changes (or modulates) a specified parameter in the learning algorithm and causes the learning block 20 to update the probability models, which is stored in the memory 22. The behavior recognition block 24 uses the updated probability models and recognizes the behavior of the mobile unit 32 again. Thus, the accuracy of behavior recognition will be expected to be more than a certain level.
While the image acquisition block 14 needs to be installed on the mobile unit 32, it is not necessary for the image recognizing apparatus 10 to be installed on the mobile unit 32 and may be located in external place. In case of being installed on the mobile unit 32, the image recognizing apparatus 10 may be either integral with or separate from the image acquisition block 14. Communication between the image acquisition block 14 and the image recognizing apparatus 10 may be done via cable or radio transmission.
The image recognizing apparatus 10 may be all or partly implemented by, for example, executing on a computer program configured to execute processes noted above.
Now process in the advance learning part are described in detail with reference to
When a mobile unit 32 moves in accordance with behavior command from the behavior command output block 12, the image acquisition block 14 acquires image I(t) at time t while the mobile unit 32 moves around in a given environment (step S42). The local feature extraction block 16 extracts local features from the acquired images I(t) and I(t+1) (step S44 . . . S48). More specifically, Gabor filters are used to calculate image intensity El(xt,yt) by being applied to each local area image within the acquired images. A plurality of image intensities El(xt,yt) in each direction of the Gabor filter are obtained for each local area according to the following Eq. (1) (step S44).
El(xt,yt)=Img(t)·Gbri(+)+Img(t+1)·Gbri(−) (1)
Here Gbrl(+) and Gbri(−) represent positive component and negative component of the Gabor filter respectively. Subscript “i” represents direction of the Gabor filter. In the embodiment, “i” ranges from 1 to 8. Img(t) represents a local area image within the image I(t) acquired at certain time t and Img(t+1) represents a local area image within the image I(t+1) acquired at consecutive time t+1. (xt,yt) shows coordinate of pixel in a local area at time t. Therefore, El(xt,yt) represents image intensity in direction i of the local area image.
Direction for applying Gabor filters and the number of Gabor filters are arbitrary. In the embodiment, imitating receptive field of visual sensation facility of human being, Gabor filters in eight directions extending radially in equal angles from center of whole image are employed.
The local feature extraction block 16 selects a direction j having the largest image intensity from the plurality of image intensities El(xt,yt) (i=1, . . . , 8) for each local area according to following Eq. (2) (step S46).
j=argmaxi El(xt,yt) (2)
It should be noted that selected directions j will be different between local areas.
Then the local feature extraction block 16 applies Gauss function to the largest image intensity Ej(xt,yt) according to following Eq. (3) to obtain local feature Ψj(xt,yt) for each local area (step S48).
In Eq. (3), “μj” is an average of image intensities Ej(xt,yt). “σj” is variance of these image intensities Ej(xt,yt). Therefore, the local feature Ψj(xt,yt) means the expression of probability density distribution for image intensity Ej(xt,yt) in each local area image with regard to direction having the largest intensity. Local features Ψj(xt,yt) are calculated as many as the number of local areas. It should be noted that the direction j where local feature Ψj(xt,yt) is calculated for each local area is different.
Upon receiving the local features Ψj(xt,yt) from the local feature extraction block 16 and the behavior command from the behavior command output block 12, the global feature extraction block 18 combines all local features Ψj(xt,yt) with regard to each largest direction j of the image intensity to obtain global feature ρj(χt|l) according to Eq. (4) (step S50).
ρj(χt|l)=∫χ
Here, “χt” means two-dimensional Cartesian coordinate of (xt,yt).
Calculated global feature ρj(χt|l) is distributed to one of classes according to the behavior command output by the behavior command output block 12 when the image I(t) is acquired, and stored in memory (step S52). Here “l” represents behavior command. In the present embodiment where three behavior commands (going straight, a left turn and a right turn) are used, l=1 corresponds to a behavior command of going straight, l=2 to a behavior command of turning left, and l=3 to a behavior command of turning right. Therefore, the global features ρj when the mobile unit is going straight (l=1), turning left (l=2) or turning right (l=3) are stored in different classes.
These classes are called “attention class” Ωl. As used herein, term “attention class” is for updating learning result efficiently by noticing certain feature when new feature is presented, not reflecting all of them. The attentional class is identical to the probability model of different behaviors.
It should be noted that the number of attention class is not limited to three but any number of the attention class may be employed correlating to the number of behavior commands.
Because the global feature ρj(χt|l) is calculated in association with behavior command for image acquired at time t, a lot of sets of global feature for eight directions are stored by behavior command.
Upper part (a) of
Comparing the polar-shaped maps in part (c) of
Back to
The EM algorithm is an iterative algorithm for estimating parameter θ which takes maximum likelihood when observed data is viewed as incomplete data. Assuming mean of the observed data is represented as μl and covariance as Σl, the parameter θ may be represented as θ(μl,Σl). EM algorithm is initiated with appropriate initial values of θ(μl,Σl). Then the parameter θ(μl,Σl) is updated one after another by iterating Expectation (E) step and Maximization (M) step alternately.
On the E step, conditional expected value φ(θ|θ(k)) is calculated according to following Eq. (5)
φ(θ|θ(k))=ΣiΣlp(ρl|Ωl;θ(k))log(p(ρl,Ωl;θ(k))) (5)
Then on the M step, parameters μl and Σl for maximizing φ(θ|θ(k)) are calculated by following Eq. (6) and comprise a new estimated value θ(k+1).
θ(k+1)=argmaxθφ(θ,θ(k)) (6)
After repeating E steps and M steps, conditional expected value φ(θ|θ(k)) is obtained. By partial differentiating this value φ(θ|θ(k)) on θ(k) and letting a result equal to zero, parameters μl and Σl may be finally calculated. More detailed explanation will be omitted because this EM algorithm is well known in the art.
By using EM algorithm, global feature of each attention class Ωl can be expressed by normal distribution (step S54).
The global feature extraction block 18 uses those calculated μl and Σl in following Eq. (7) to calculate prior probability
Here, N is the number of dimensions of global feature ρj(χt|l).
Next, supervised learning with the use of neural network will be described. In this learning, conditional probability density function p(I(t)|Ωl) is calculated for image I(t) with attention class as supervising signal (step S58).
In
By such supervised learning with neural network, direct relation between image I(t) and attention class Ωl, that is, conditional probability density function p(I(t)|Ωl) may be obtained.
Process in steps S54 to S58 are executed every behavior command l. Therefore, in the present embodiment, prior probability
Probability model calculated by learning block 20 is stored in memory 22 (step S 60). If advance learning is continued, “yes” is selected in step S62 and a series of processes from step S42 to S60 is repeated, and then probability model is updated. While the mobile unit 32 is moving, advance learning is executed for all of images I(t) acquired in time t. When probability model is judged to be highly accurate enough to recognize behavior of mobile unit 32 (for example, process is completed for predetermined number of images I(t)), the process ends (step S64).
Referring to
The image acquisition block 14 acquires two of new images at time t every preset interval (step 82). Probability models (prior probability
Then, among three calculated confidences p (Ωl(t)), p (Ω2(t)), and p(Ω3(t)), the largest one is selected (step S86).
The behavior assessment block 26 determines whether logarithmic likelihood of confidence p(Ωl(t)) is larger than a predetermined value K (step S88). If log p (Ωl(t))>K, behavior command l corresponding to attention class Ωl of which the confidence is largest is recognized as the behavior of the mobile unit 32 (step S92). Otherwise, log p(Ωl(t))≦K, the attention generation block 28 generates attentional demanding. The attentional modulation block 30 increments gaussian mixture “m” in Eq. (7) by specified value (that is, attentional modulation) (step S90). And in the learning block 20, a series of process in steps S56 to S60 in
The process goes back to step S84 and repeats steps S84 to S88. Thus, gaussian mixture m is increased until logarithmic likelihood log p(Ωl(t)) excesses the predetermined value K. In alternative embodiment, once-generated probability model may be always used without updating process.
As discussed above, the image recognizing apparatus according to the invention does not recognize the behavior of the mobile unit based only on the image. Instead, since the image recognizing apparatus first complete the learning on the relationship between the global feature extracted from the image and behavior command and then recognizes the behavior using the learning result, the apparatus can recognize the behavior of the mobile unit rapidly and accurately in real application.
In addition, in the case of the mobile unit 32 can not move correctly in accordance with supplied behavior command due to such as error-mounted wheels, the image recognizing apparatus may get the real moving status of the mobile unit according to the behavior recognizing process.
Now one example of the invention will be described.
The receiver 136 receives behavior command signal from the outside device and supplies it to the behavior command output block 12. The RC car 100 makes going straight, left turn, or right turn in accordance with the behavior command. The transmitter 138 transmits behavior of RC car 100 recognized by the behavior recognition block 24 to the outside device.
After completing advance learning part, the result of behavior recognition for 24 frames of image is described as follows.
Referring to
Referring to
As described above, the image recognizing apparatus according to the invention improves the reliability by repeating learning in attention class generated in bottom-up fashion in the advance learning part. In the behavior recognizing part, recognizing accuracy of behavior is improved because learning result is updated until logarithmic likelihood of confidence excesses the predetermined value.
According to the invention, instead of recognizing the behavior of mobile unit based only on images, the image recognizing apparatus pre-learned the relationship between images and behavior commands and determines the behavior using the learning result. Therefore, it can recognize the behavior of the mobile unit rapidly and accurately in real application.
Although it has been described in details in terms of specific embodiment according to the invention, it is not intended to limit the invention to such specific embodiment. Those skilled in the art will appreciate that various modifications can be made without departing from the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2001-134848 | May 2001 | JP | national |
2002-107013 | Apr 2002 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
4543660 | Maeda | Sep 1985 | A |
5109425 | Lawton | Apr 1992 | A |
5307419 | Tsujino et al. | Apr 1994 | A |
5500904 | Markandey et al. | Mar 1996 | A |
5751838 | Cox et al. | May 1998 | A |
6272237 | Hashima | Aug 2001 | B1 |
6307959 | Mandelbaum et al. | Oct 2001 | B1 |
6323807 | Golding et al. | Nov 2001 | B1 |
6353679 | Cham et al. | Mar 2002 | B1 |
6532305 | Hammen | Mar 2003 | B1 |
6683677 | Chon et al. | Jan 2004 | B2 |
20020041324 | Satoda | Apr 2002 | A1 |
Number | Date | Country |
---|---|---|
11-134504 | May 1999 | JP |
2000-171250 | Jun 2000 | JP |
Number | Date | Country | |
---|---|---|---|
20030007682 A1 | Jan 2003 | US |