The present invention relates to object posture estimation, and especially to a training method and a training apparatus for purpose of object posture orientation estimation, and a method and an apparatus for estimating the posture orientation of an object in an image.
Methods of estimating the posture of an object (e.g., human, animal, object or the like) in a single image may be divided into model based and learning based according to their technical principles. According to the learning based methods, three dimensional (3-D) postures of objects are directly deduced from image features. An often used image feature is object outline information.
Posture orientations of objects are not distinguished in the existing methods for object posture estimation. Because of complexity of object posture variation, different posture orientations of objects may bring about further ambiguity in the estimation. Therefore, accuracy of image posture estimation under different orientations is far lower than that of the posture estimation under one orientation.
In view of the above deficiencies of the prior art, the present invention is intended to provide a method and an apparatus for training based on input images, and a method and an apparatus for estimating a posture orientation of an object in an image, to facilitate distinguishing object posture orientations in the object posture estimation.
An embodiment of the present invention is a method of training based on input images, including: extracting an image feature from each of a plurality of input images each having an orientation class; with respect to each of a plurality of orientation classes, estimating a mapping model for transforming image features extracted from input images of the orientation class into 3-D object posture information corresponding to the input images through a linear regression analysis; and calculating a joint probability distribution model based on samples obtained by connecting the image features with their corresponding 3-D object posture information, wherein single probability distribution models which the joint probability distribution model is based on correspond to different orientation classes, and each of the single probability distribution models is based on samples including the image features extracted from the input images of a corresponding orientation class.
Another embodiment of the present invention is an apparatus for training based on input images, including: An extracting unit which extracts an image feature from each of a plurality of input images each having an orientation class; a map estimating unit which, with respect to each of a plurality of orientation classes, estimates a mapping model for transforming image features extracted from input images of the orientation class into 3-D object posture information corresponding to the input images through a linear regression analysis; and a probability model calculating unit which calculates a joint probability distribution model based on samples obtained by connecting the image features with their corresponding 3-D object posture information, wherein single probability distribution models which the joint probability distribution model is based on correspond to different orientation classes, and each of the single probability distribution models is based on samples including the image features extracted from the input images of a corresponding orientation class.
According to the embodiments of the present invention, the input images have the respective orientation classes. It is possible to extract an image feature from each input image. Based on the orientation class, it is possible to estimate the mapping model through the linear regression analysis. Such mapping model acts as a function for converting image features of the orientation class to the corresponding 3-D object posture information. It is possible to connect the image feature with the corresponding 3-D object posture information to obtain a sample, so as to calculate the joint probability distribution model based on these samples. The joint probability distribution model is based on a number of single probability distribution models, where each orientation class has one single probability distribution model. Based on the samples including image features of the respective orientation class, it is possible to obtain a corresponding single probability distribution model. Therefore, according to the embodiments of the present invention, it is possible to train a model for object posture orientation estimation, that is, the mapping model and the joint probability distribution model for the posture orientations.
Further, in the embodiments, it is possible to calculate a feature transformation model for reducing dimensions of the image features with a dimension reduction method. Accordingly, it is possible to transform the image features by using the feature transformation model, for use in estimating the mapping model and calculating the joint probability distribution model. The image feature transformed through the feature transformation model may have a smaller number of dimensions, facilitating the reduction of subsequent processing cost for estimation and calculation.
Another embodiment of the present invention is a method of estimating a posture orientation of an object in an image, including: Extracting an image feature from an input image; with respect to each of a plurality of orientation classes, obtaining 3-D object posture information corresponding to the image feature based on a mapping model corresponding to the orientation class, for mapping the image feature to the 3-D object posture information; calculating a joint probability of a joint feature including the image feature and the corresponding 3-D object posture information for each of the orientation classes according to a joint probability distribution model based on single probability distribution models for the orientation classes; calculating a conditional probability of the image feature in condition of the corresponding 3-D object posture information based on the joint probability; and estimating the orientation class corresponding to the maximum of the conditional probabilities as the posture orientation of the object in the input image.
Another embodiment of the present invention is an apparatus for estimating a posture orientation of an object in an image, including: an extracting unit which extracts an image feature from an input image; a mapping unit which, with respect to each of a plurality of orientation classes, obtains 3-D object posture information corresponding to the image feature based on a mapping model corresponding to the orientation class, for mapping the image feature to the 3-D object posture information; a probability calculating unit which calculates a joint probability of a joint feature including the image feature and the corresponding 3-D object posture information for each of the orientation classes according to a joint probability distribution model based on single probability distribution models for the orientation classes, and calculates a conditional probability of the image feature in condition of the corresponding 3-D object posture information based on the joint probability; and an estimating unit which estimates the orientation class corresponding to the maximum of the conditional probabilities as the posture orientation of the object in the input image.
According to the embodiments of the present invention, it is possible to extract an image feature from the input image. Because each orientation class has a corresponding mapping model for converting the image feature of the orientation class to 3-D object posture information, it is possible to assume that the image feature has the orientation classes respectively, so as to obtain the 3-D object posture information corresponding to the image feature by using the corresponding mapping model. According to the joint probability distribution model, it is possible to calculate joint probabilities that the image feature and the corresponding 3-D object posture information occur in the assumption of the orientation classes respectively. According to the joint probabilities, it is possible to calculate conditional probabilities that the image feature occurs in condition that the corresponding 3-D object posture information occurs. It can be seen that, the orientation class assumption corresponding to the maximum conditional probability may be estimated as the posture orientation of the object in the input image. Therefore, according to the embodiments of the present invention, it is possible to estimate the object posture orientation.
Further, in the embodiments, it is possible to transform the image feature with a feature transformation model for dimension reduction to obtain the 3-D object posture information. The image feature transformed through the feature transformation model may have a smaller number of dimensions, facilitating the reduction of subsequent processing cost for mapping and probability calculation.
Posture orientations of objects are not distinguished in the existing methods for object posture estimation. Because of complexity of object posture variation, different posture orientations of objects may bring about great ambiguity in the estimation. Therefore, accuracy of image posture estimation under different orientations is far lower than that of the posture estimation under one orientation. An object of the present invention is to estimate the orientation of objects in images and videos, so as to further estimate the object posture under a single orientation. According to experimental results, the present invention can estimate the posture of objects in images and videos effectively.
The above and/or other aspects, features and/or advantages of the present invention will be easily appreciated in view of the following description by referring to the accompanying drawings. In the accompanying drawings, identical or corresponding technical features or components will be represented with identical or corresponding reference numbers.
The embodiments of the present invention are below described by referring to the drawings. It is to be noted that, for purpose of clarity, representations and descriptions about those components and processes known by those skilled in the art but unrelated to the present invention are omitted in the drawings and the description.
As illustrated in
The input images are those including objects having various posture orientation classes. The posture orientation classes represent different orientations assumed by the objects respectively. For example, the posture orientation classes may include −80°, −40°, 0°, +40° and +80°, where −80° is a posture orientation class representing that the object turns to right by 80 degree relative to the lens of the camera, −40° is a posture orientation class representing that the object turns to right by 40 degree relative to the lens of the camera, 0° is a posture orientation class representing that the object faces to the lens of the camera, +40° is a posture orientation class representing that the object turns to left by 40 degree relative to the lens of the camera, and +80° is a posture orientation class representing that the object turns to left by 80 degree relative to the lens of the camera.
Of course, the posture orientation classes may also represent orientation ranges. For example, the 180° range from the orientation in which the object faces to the left side to the orientation in which the object faces to the right side is divided into 5 orientation ranges: [−90°, −54°], [−54°, −18°], [−18°, 18°], [18°, 54°], [54°, 90°], that is, 5 posture orientation classes.
The number of the posture orientation classes and specific posture orientations represented by the classes may be set arbitrarily as required, and are not limited to the above example.
In an embodiment of the present invention, the input images and the corresponding posture orientation classes are supplied to the apparatus 100.
Preferably, the input images include object images containing no background but with various posture orientations, and object images containing background and with various posture orientations.
The extracting unit 101 extracts an image feature from each of a plurality of input images each having an orientation class. The image feature may be various features for object posture estimation. Preferably, the image feature is a statistical feature relating to edge directions in the input images, for example, gradient orientation histogram (HOG) feature and scale invariant feature transform SIFT feature.
In a specific example, it is assumed that the gradient orientation histogram feature is adopted as the image feature, and the input images have the same width and the same height (120 pixels×100 pixels). However, the embodiments of the present invention are not limited to the assumed specific feature and size.
In this example, the extracting unit 101 may calculate gradients in the horizontal direction and in the vertical direction for each pixel in the input images, that is,
Horizontal gradient:Ix(x,y)=d(I(x,y))/dx=I(x+1,y)−I(x−1,y)
Vertical gradient:Iy(x,y)=d(I(x,y))/dy=I(x,y+1)−I(x,y−1)
where I(x, y) represents the grey scale value of a pixel, x and y respectively represent coordinates of the pixel in the horizontal direction and the vertical direction.
Then, the extracting unit 101 may calculate the gradient orientation and the gradient intensity of each pixel in the input images according to gradients in the horizontal direction and in the vertical direction for the pixel.
Gradient orientation: θ(x,y)=argtg(|Iy/Ix|)
Gradient intensity: Grad(x,y)=√{square root over (Ix2+Iy2)}
where the range of the gradient orientation θ(x,y) is [0, π].
In this example, the extracting unit 101 may extract 24 blocks of size 32×32 one by one from left to right and from top to bottom, where there are 6 blocks in each row of the horizontal direction, and there are 4 blocks in each column of the vertical direction. Any two blocks adjacent in the horizontal direction or the vertical direction overlap with each other by one-half of them.
The extracting unit 101 may divide each 32×32 block into 16 small blocks of size 8×8, where there are 4 small blocks in each row of the horizontal direction, and there are 4 small blocks in each column of the vertical direction. The small blocks are arranged in the horizontal direction and then in the vertical direction.
For each small block of 8×8, the extracting unit 101 calculates a gradient orientation histogram for 64 pixels in the small block, where the gradient orientations are divided into 8 direction bins, that is, every π/8 in the range from 0 to π may be one direction bin. That is to say, for each of the 8 direction bins, a sum of gradient intensities of the pixels having the gradient orientations falling within the direction bin is calculated based on 64 pixels of every small blocks of 8×8, thus obtaining an 8-dimension vector. Accordingly, a 128-dimension vector is obtained for each 32×32 block.
For each input image, the extracting unit 101 obtains an image feature by connecting the vector of each block in sequence, and therefore the number of dimensions in the image feature is 3072, that is, 128×24=3072.
It is to be noted that, the embodiments of the present invention is not limited to the division scheme and the specific numbers of the blocks and the small blocks in the above examples, and may also adopt other division schemes and specific numbers. The embodiments of the present invention is not limited to the method of extracting features in the above example, and may also adopt other methods of extracting image features for object posture estimation.
Returning to
For each input image, 3-D object posture information corresponding to the posture of an object contained in the input image is prepared in advance.
In a specific example, the image feature (feature vector) extracted from an input image is represented as Xm, where m is the number of dimensions of the image feature. All the image features extracted from n input images are represented as a matrix Xm×n. Further, 3-D object posture information (vector) corresponding to the extracted image feature Xm is represented as Yp, where p is the number of dimensions of the 3-D object posture information. 3-D object posture information corresponding to all the image features extracted from n input images is represented as a matrix Yp×n.
Assuming that Yp×n=Ap×m×Xm×n, it is possible to calculate Ap×m such that (Yp×n−Ap×m×Xm×n)2 is minimum through a linear regression analysis, e.g., a least square method. Ap×m is the mapping model.
Returning to
That is to say, the joint probability distribution model is based on the single probability distribution models for different orientation classes. Through a known method, it is possible to calculate a corresponding single probability distribution model (i.e., model parameters) based on a set of samples of each orientation class, and it is also possible to calculate a joint probability distribution model (i.e., model parameters) for the single probability distribution models of all the posture orientation classes.
Suitable joint probability distribution models include, but not limited to, a Gaussian mixture model, a Hidden Markov Model and a Conditional Random Field.
In a specific example, the Gaussian mixture model is adopted. In this example, a joint feature (i.e., sample) [X,Y]T is formed by an image feature (vector) X and 3-D object posture information (vector) Y. It is assumed that the joint feature [X,Y]T meets a probability distribution equation:
where M is the number of the posture orientation classes, N(x|ui,Σi) is the single Gauss model for posture orientation class i, i.e., a normal distribution model. ui and Σi are parameters of the normal distribution model, ρi represents the weight of the single Gauss model for posture orientation class i in a Gaussian mixture model. It is possible to calculate optimal ρi, ui and Σi, i=1, . . . , M, i.e., the mapping model through a known estimating method, e.g., an Expectation-Maximization method (EM) based on a set of joint features for all the posture orientation classes.
As shown in
At step 305, with respect to each of the plurality of orientation classes, a mapping model for converting image features extracted from input images of the orientation class into 3-D object posture information corresponding to the input images is estimated through a linear regression analysis. That is to say, for each posture orientation class, it is assumed that there is a certain functional or mapping relation by which the image features extracted from the input images of the posture orientation class can be converted or mapped to the 3-D object posture information corresponding to the input images. Through the linear regression analysis, it is possible to estimate such functional or mapping relation, i.e., mapping model based on the extracted image features and the corresponding 3-D object posture information.
For each input image, 3-D object posture information corresponding to the posture of an object contained in the input image is prepared in advance.
In a specific example, the image feature (feature vector) extracted from an input image is represented as Xm where m is the number of dimensions of the image feature. All the image features extracted from n input images are represented as a matrix Xm×n. Further, 3-D object posture information (vector) corresponding to the extracted image feature Xm is represented as Yp, where p is the number of dimensions of the 3-D object posture information. 3-D object posture information corresponding to all the image features extracted from n input images is represented as a matrix Yp×n.
Assuming that Yp×n=Ap×m, Xm×n, it is possible to calculate Ap×m such that (Yp×n−Ap×m×Xm×n)2 is minimum through a linear regression analysis, e.g., a least square method. Ap×m is the mapping model. If there are Q orientation classes, Q corresponding mapping models may be generated.
Then at step 307, a joint probability distribution model is calculated based on samples obtained by connecting the image features with their corresponding 3-D object posture information, wherein single probability distribution models which the joint probability distribution model is based on correspond to different orientation classes, and each of the single probability distribution models is based on samples including the image features extracted from the input images of a corresponding orientation class.
That is to say, the joint probability distribution model is based on the single probability distribution models for different orientation classes. Through a known method, it is possible to calculate a corresponding single probability distribution model (i.e., model parameters) based on a set of samples of each orientation class, and it is also possible to calculate a joint probability distribution model (i.e., model parameters) for the single probability distribution models of all the posture orientation classes.
Suitable joint probability distribution models include, but not limited to, a Gaussian mixture model, a Hidden Markov Model and a Conditional Random Field.
In a specific example, the Gaussian mixture model is adopted. In this example, a joint feature (i.e., sample) [X,Y]T is formed by a image feature (vector) X and 3-D object posture information (vector) Y. It is assumed that the joint feature [X,Y]T meets a probability distribution equation:
where M is the number of the posture orientation classes, N(x|ui,Σi) is the single Gauss model for posture orientation class i, i.e., a normal distribution model. ui and Σi are parameters of the normal distribution model, ρi represents the weight of the single Gauss model for posture orientation class i in a Gaussian mixture model. It is possible to calculate optimal ρi, ui and Σi, i=1, . . . , M, i.e., the mapping model through a known estimating method, e.g., an Expectation-Maximization method (EM) based on a set of joint features for all the posture orientation classes.
Then the method 300 ends at step 309.
As illustrated in
The transformation model calculating unit 404 calculates a feature transformation model for reducing dimensions of the image features by using a dimension reduction method. The dimension reduction method comprises, but not limited to, principle component analysis, factor analysis, single value decomposition, multi-dimensional scaling, locally linear embedding, isomap, linear discriminant analysis, local tangent space alignment, and maximum variance unfolding. The obtained feature transformation model may be used to transform the image features extracted by the extracting unit 401 into image features with less dimensions.
In a specific example, the image feature (feature vector) extracted from an input image is represented as Xm, where m is the number of dimensions of the image feature. All the image features extracted from n input images are represented as a matrix Xm×n. It is possible to calculate a matrix Mapd×m based on the image features Xm×n through the principle component analysis method, where d<m.
The feature transforming unit 405 transforms the image features by using the feature transformation model, for use in estimating the mapping model and calculating the joint probability distribution model. For example, in the previous example, it is possible to calculate the transformed image features through the following equation:
X′
d×n=Mapd×m×Xm×n.
The transformed image features (the number of dimensions is d) are supplied to the map estimating unit 402 and the probability model calculating unit 403.
In the above embodiment, because the image features transformed with the feature transformation model have less dimensions, it is advantageous for reducing subsequent processing cost for estimation and calculation.
As shown in
At step 503, a feature transformation model for reducing dimensions of the image features extracted at step 502 is calculated through a dimension reduction method. The dimension reduction method comprises, but not limited to, principle component analysis, factor analysis, single value decomposition, multi-dimensional scaling, locally linear embedding, isomap, linear discriminant analysis, local tangent space alignment, and maximum variance unfolding. The obtained feature transformation model may be used to transform the extracted image features into image features with less dimensions.
In a specific example, the image feature (feature vector) extracted from an input image is represented as Xm, where m is the number of dimensions of the image feature. All the image features extracted from n input images are represented as a matrix Xm×n. It is possible to calculate a matrix Mapd×m based on the image features Xm×n through the principle component analysis method, where d<m.
At step 504, the image features are transformed by using the feature transformation model, for use in estimating the mapping model and calculating the joint probability distribution model. For example, in the previous example, it is possible to calculate the transformed image features through the following equation:
X″
d×n=Mapd×m×Xm×n.
At step 505, as in step 305 of the method 300, with respect to each of the plurality of orientation classes, a mapping model for converting image features (already transformed) extracted from input images of the orientation class into 3-D object posture information corresponding to the input images is estimated through a linear regression analysis.
Then at step 507, as in step 307 of the method 300, a joint probability distribution model is calculated based on samples obtained by connecting the image features (already transformed) with their corresponding 3-D object posture information, wherein single probability distribution models which the joint probability distribution model is based on correspond to different orientation classes, and each of the single probability distribution models is based on samples including the image features extracted from the input images of a corresponding orientation class.
Then the method 500 ends at step 509.
As illustrated in
The extracting unit 601 extracts an image feature from an input image. The input image has the same specification as that of the input images described in the above with reference to the embodiment of
With respect to each of a plurality of orientation classes, the mapping unit 602 obtains 3-D object posture information corresponding to the image feature based on a mapping model corresponding to the orientation class, for mapping the image feature to the 3-D object posture information. The mapping model is that described in the above with reference to the embodiment of
The probability calculating unit 603 calculates a joint probability of a joint feature including the image feature and the corresponding 3-D object posture information for each of the orientation classes according to a joint probability distribution model based on single probability distribution models for the orientation classes, and calculates a conditional probability of the image feature in condition of the corresponding 3-D object posture information based on the joint probability. The joint probability distribution model is that described in the above with reference to the embodiment of
The estimating unit 604 estimates the orientation class corresponding to the maximum of the conditional probabilities p(Y|X) calculated for all the possible orientation classes as the posture orientation of the object in the input image.
As shown in
At step 705, with respect to each of a plurality of orientation classes, 3-D object posture information corresponding to the image feature is obtained based on a mapping model corresponding to the orientation class, for mapping the image feature to the 3-D object posture information. The mapping model is that described in the above with reference to the embodiment of
At step 707, a joint probability of a joint feature including the image feature and the corresponding 3-D object posture information for each of the orientation classes is calculated according to a joint probability distribution model based on single probability distribution models for the orientation classes, and a conditional probability of the image feature in condition of the corresponding 3-D object posture information is calculated based on the joint probability. The joint probability distribution model is that described in the above with reference to the embodiment of
At step 708, the orientation class corresponding to the maximum of the conditional probabilities p(Y|X) calculated for all the possible orientation classes is estimated as the posture orientation of the object in the input image. The method 700 ends at step 709.
As illustrated in
The transforming unit 805 transforms the image feature through a feature transformation model for dimension reduction to obtain the 3-D object posture information. The feature transformation model may be that described in the above with reference to the embodiment of
In the above embodiment, because the image feature transformed with the feature transformation model has less dimensions, it is advantageous for reducing subsequent processing cost for mapping and calculation.
As shown in
At step 904, the image feature is transformed through a feature transformation model for dimension reduction to obtain the 3-D object posture information. The feature transformation model may be that described in the above with reference to the embodiment of
At step 905, as in step 705, with respect to each of a plurality of orientation classes, 3-D object posture information corresponding to the image feature is obtained based on a mapping model corresponding to the orientation class, for mapping the image feature to the 3-D object posture information.
At step 907, as in step 707, a joint probability of a joint feature including the image feature and the corresponding 3-D object posture information for each of the orientation classes is calculated according to a joint probability distribution model based on single probability distribution models for the orientation classes, and a conditional probability of the image feature in condition of the corresponding 3-D object posture information is calculated based on the joint probability.
At step 908, as in step 708, the orientation class corresponding to the maximum of the conditional probabilities calculated for all the possible orientation classes is estimated as the posture orientation of the object in the input image. The method 900 ends at step 909.
Although the embodiments of the present invention are described with respect to images in the above, the embodiments of the present invention may also be applied to videos, where the videos are processed as sequences of images.
In
The CPU 1001, the ROM 1002 and the RAM 1003 are connected to one another via a bus 1004. An input/output interface 1005 is also connected to the bus 1004.
The following components connected to input/output interface 1005: An input section 1006 including a keyboard, a mouse, or the like; An output section 1007 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), or the like, and a loudspeaker or the like; The storage section 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs a communication process via the network such as the interne.
A drive 1010 is also connected to the input/output interface 1005 as required. A removable medium 1011, such as a magnetic disk, an optical disk, a magnet-optical disk, a semiconductor memory, or the like, is mounted on the drive 1010 as required, so that a computer program read therefrom is installed into the storage section 1008 as required.
In the case where the above-described steps and processes are implemented by the software, the program that constitutes the software is installed from the network such as the interne or the storage medium such as the removable medium 1011.
One skilled in the art should note that, this storage medium is not limit to the removable medium 1011 having the program stored therein as illustrated in
The present invention is described in the above by referring to specific embodiments. One skilled in the art should understand that various modifications and changes can be made without departing from the scope as set forth in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
200910137360.5 | Apr 2009 | CN | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN10/72150 | 4/23/2010 | WO | 00 | 10/24/2011 |