This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2007-244294, filed on Sep. 20, 2007; the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to an apparatus, a method, and a computer program that search for/extract an object corresponding to a designated object from different images, by extracting the object such as a person from images, and then by comparing features of the objects, such as colors, patterns, and shapes.
2. Description of the Related Art
There have been known methods that track or search for a specific person or object, by extracting variation regions including a moving object from a plurality of images, calculating a similarity or the like of the variation regions extracted respectively from the images, and associating the variation regions with each other.
For example, technologies that track a designated person by extracting a color region that is similar to registered color information of a designated person are disclosed in Takuro Sakiyama, Nobutaka Shimada, Jun Miura, and Yoshiaki Shirai (2003, September), Tracking designated person using color information, FIT2003 (2nd Forum on Information Technology), No. I-085, pp. 183-184 (hereinafter, “Document 1”), and in Takuro Sakiyama, Jun Miura, and Yoshiaki Shirai (2004, September), Tracking designated person based on color features, FIT2004 (3rd Forum on Information Technology), No. 1-036, pp. 79-80. JP-A 2005-202938 (KOKAI) discloses a technology that tracks/searches for a specific person, by dividing each of the variation regions extracted from a image into blocks, and then by associating the person using color information of the blocks.
A method disclosed in JP-A 2005-202938 (KOKAI), however, has a premise that the entire body of a person designated as a search object is extracted as a variation region. This poses a problem in that associating, for example, an image in which only an upper body is visible with an image in which the entire body is visible is impossible or difficult. Specifically, the problem is that when the body of a designated person is partially hidden by other persons or the like, the designated person cannot be tracked/searched for properly.
The method disclosed in Document 1 tracks and steadily updates a human region, thereby enabling search of the human region even when the body is partially hidden. However, the search is impossible when a tracking process cannot be applied, for example, when still images are compared with each other.
According to one aspect of the present invention, an object detecting apparatus includes a storage unit configured to store image features of first segment regions divided from a first region and extraction reliabilities of the respective first segment regions. The first region is extracted, as a region including an object to be searched for, from learning image data. The apparatus also includes a receiving unit that receives input image data; an extracting unit that extracts a second region that is a candidate region including the object from the input image data; a reliability calculating unit that calculates extraction reliabilities of second segment regions divided from the second region; a feature calculating unit that calculates features of the respective second segment regions; a similarity calculating unit; and a determining unit. The similarity calculating unit calculates (1) segment similarities for respective combinations of the first segment regions and the second segment regions corresponding to the first segment regions, the segment similarities indicating similarities between the features of the first segment regions and the features of the second segment regions, (2) products of the segment similarities, the reliabilities of the first segment regions, and the reliabilities of the second segment regions, for the respective combinations, and (3) an object similarity indicating a sum of the products. The determining unit, when the object similarity is greater than a predetermined first threshold, determines that the second region includes the object.
Exemplary embodiments of an apparatus, a method, and a computer program according to the present invention will be described in detail below with reference to the accompanying drawings. Although a person is described as a search object, the search object is not limited to a person, and can be any objects such as vehicles and animals appearing in an image.
A object detecting apparatus according to a first embodiment of the present invention detects a similar human region, by dividing an extracted human region into a predetermined number of segment regions, and then by summing up similarities (segment similarities) calculated for the respective segment regions, so as to find the sum as a similarity (object similarity) of the entire human region. By comparing an aspect ratio of the extracted human region with a reference value, determination is made as to whether a search object, i.e., a person, is partially hidden. For the hidden portion, a low extraction reliability is calculated, and the sum of the segment similarities weighted by the reliability is found as an object similarity.
The storage unit 121 stores therein learning image data for extracting in advance a search object to be compared, by associating the data with calculation results such as features and the like calculated by the feature calculating unit 105 (described later) or the like.
The features 1 to J represent features of respective J segment regions made by dividing a human region. The reliabilities 1 to J represent reliabilities of the respective J segment regions. The j-th feature and the j-th reliability are referred to as Fj and cj, respectively. Methods of calculating such feature and similarity will be described later.
The storage unit 121 can be constituted by any commonly used storage medium, such as a hard disk drive (HDD), an optical disk, a memory card, or a random access memory (RAM).
The receiving unit 101 receives image data, such as moving picture data or still picture data. The receiving unit 101 may receive image data captured by, for example, a video camera (not shown), or image data retrieved from an image file stored in a memory device of the storage unit 121. Alternatively, the receiving unit 101 may receive image data supplied from an external apparatus via a network.
The extracting unit 102 extracts a human region from input image data. From images captured by a fixed camera, the extracting unit 102 extracts, as a human region, a variation region by a background subtraction method for example. For images captured by a non-fixed camera, still pictures, or video images to which background subtraction methods cannot be applied, the extracting unit 102 can use: a method for directly searching a human region by collating the human region to a person pattern; a method for extracting a human region with high accuracy by collating parts of the contour of an extraction object; or other methods.
The extracting unit 102 can use not only these methods for extracting a human region, but any methods that enable extraction of all or a portion of an object person from image data.
The reliability calculating unit 103 calculates an extraction reliability of each of the segment regions in an extracted human region. The extraction reliability represents a certainty level of extraction. When an extracted segment region is certainly a portion of a person, the reliability takes a large value, whereas when the segment region is hidden for example and therefore not being the person highly likely, the reliability takes a small value.
The reliability calculating unit 103 calculates an extraction reliability, according to an extraction method used by the extracting unit 102 for extracting a human region. For example, when using a method for extracting a human region by collating the human region to a predetermined pattern, the reliability calculating unit 103 can find a reliability by calculating a similarity between the human region and the pattern.
The reliability calculating unit 103 calculates an aspect ratio of an extracted human region and compares the aspect ratio with a predetermined reference aspect ratio, so as to estimate whether a hidden region exists. The hidden region is a region in which a human region of a designated person is hidden by other persons or the like. When estimating that a hidden region does exist, the reliability calculating unit 103 expands the human region by adding the hidden region thereto so that the aspect ratio matches a reference value.
The reliability calculating unit 103 calculates a reliability of each of the segment regions made by the region dividing unit 104 dividing a human region (described later). Specifically, the reliability calculating unit 103 allocates a reliability obtained when the entire human region is extracted to segment regions made by dividing a region other than the hidden region. Further, for segment regions made by dividing the hidden region, the reliability calculating unit 103 calculates a reliability smaller than that of the region other than the hidden region. For example, when the reliability is not less than 0 and not more than 1, the reliability calculating unit 103 calculates a reliability 0.1 for the hidden region.
The region dividing unit 104 divides an extracted human region, or a human region including an added hidden region into a plurality of segment regions. The region dividing unit 104 can divide such human region into segment regions in any shape and size. To increase collation accuracy, the region dividing unit 104 preferably divides a human region into J equal slices stacked in a vertical direction of the human region.
For an extracted human region including a hidden region added by the reliability calculating unit 103, the region dividing unit 104 may divide the region at an interface between the extracted human region and the hidden region. Alternatively, the region dividing unit 104 may divide the region into J equal pieces regardless of the position of the interface with the hidden region. In this case, for a segment region including both the extracted human region and the hidden region, the reliability calculating unit 103 calculates a reliability by weights according to ratios of the human region and the hidden region.
The feature calculating unit 105 calculates a feature of image data, such as color, pattern, or shape, for each of the divided segment regions. As a feature to be calculated, features used for related-art image search may be used, such as texture (pattern) features found by color histogram, fourier descriptor, and wavelet coefficient. To a segment region corresponding to the hidden region, the feature calculating unit 105 also applies a feature calculated using images in the above regions. Alternatively, the feature calculating unit 105 may use, as a feature of the region, a fixed value defined as an average feature for example.
The similarity calculating unit 106 calculates an object similarity, using features stored in advance in the storage unit 121 and features calculated using search object image data designated for searching for a search object. The object similarity represents a level of similarity between a human region extracted from the learning image data and a human region extracted from the search object image data.
The determining unit 107 compares a calculated object similarity with a predetermined threshold. If the object similarity is greater than the threshold, the determining unit 107 determines that a search object extracted from the learning image data matches a search object extracted from the search object image data.
Referring to
To begin with, the receiving unit 101 receives learning image data (Step S301). From the learning image data thus input, the extracting unit 102 extracts a human region (Step S302).
The reliability calculating unit 103 performs a hidden region determination process for determining whether a hidden region exists in a search object (Step S303). The hidden region determination process will be described in detail later.
After the hidden region determination process, the region dividing unit 104 divides the extracted human region, or the human region including the added hidden region into J segment regions (Step S304).
On the contrary,
Referring back to
The feature calculating unit 105 calculates a feature Fj for each of the divided J segment regions (Step S306). The feature calculating unit 105 then stores in the storage unit 121 the feature Fj and the reliability cj both calculated for each of the segment regions (Step S307), and ends the learning process.
The hidden region determination process at Step S303 is described in detail.
To begin with, the reliability calculating unit 103 calculates an aspect ratio r of an extracted human region (Step S601). The aspect ratio r is represented as r=h/w, where w is the width and h is the height of the human region.
The reliability calculating unit 103 determines whether the calculated aspect ratio r is smaller than a reference value R found in advance as an aspect ratio of an average person (Step S602).
If the aspect ratio r is smaller than the reference value R (Yes at Step S602), the reliability calculating unit 103 estimates that a hidden region exists in the search object (Step S603). The reliability calculating unit 103 adds the hidden region to the extracted human region such that the aspect ratio matches the reference value (Step S604). Specifically, the reliability calculating unit 103 adds the hidden region to at least one of the upper and the lower portions of the human region, according to the ratio of the hidden region, which is calculated as (R−r)/R.
The reliability calculating unit 103 cannot determine in what ratio the hidden region exists on either the upper or the lower portion, or on both. Thus, the reliability calculating unit 103 may modify the human region by adding a plurality of hidden regions to the human region as the upper and the lower portions such that the hidden regions in combination collectively satisfy (R−r)/R.
For each of the combinations as described, subsequent processes (a region dividing process, a feature calculating process, and a feature storing process) are carried out. Further, in a search process (described later), a similarity is determined using a feature and a reliability both calculated for each of the combinations, and a combination achieving the highest similarity is employed.
With regard to a height direction of a person, because the lower body is often hidden, the reliability calculating unit 103 may limit a hidden region to a region corresponding to the lower body. This enables reduction in processing load of the hidden region determination process.
The foregoing describes only an arrangement that the aspect ratio r=h/w is smaller than the reference value, i.e., a hidden region exists in the height direction. When a hidden region exists in the width direction, determination is made as to whether the aspect ratio is greater than the reference value R (=2), so that the similar hidden region estimating/adding processes may be performed.
Alternatively, by replacing the aspect ratio by r′=w/h, determination is made as to whether the aspect ratio is smaller than the reference value R′=W/H (=0.5) thus replaced accordingly, so that the similar hidden region estimating/adding processes may be performed.
Such estimation of a hidden region as shown in
On the contrary, the estimation of a hidden region as shown in
When the extracting unit 102 uses only the person search process, only the human region (the whole body) and its extraction reliability are obtained. Therefore, the extraction reliability is used as a reliability of the divided segment regions made by the region dividing unit 104.
Referring to
On the contrary,
Referring to
To begin with, the receiving unit 101 receives search object image data (Step S1501). Subsequent steps S1502 to S1506 including human region extraction process, a hidden region determination process, a reliability calculating process, a region dividing process, and a feature calculating process are the same as those at Steps S302 to S306 in the learning process, and the descriptions thereof are not repeated.
After features of search object image data are calculated, the similarity calculating unit 106 calculates an object similarity using the features thus calculated and the features stored in the storage unit 121 (Step S1507). A specific method of calculating an object similarity is now described.
Object ID of an object to be compared is represented by i, feature is represented by Fi,j (j=1 to J) , and reliability is represented by ci,j. Further, a feature Finput,j and a reliability cinput,j are calculated for a human region extracted from input search object image data.
The similarity calculating unit 106 calculates an object similarity Sinput,j that is a similarity of a human region, using Equations (1) and (2):
Similarity (Finput,j, Fi,j) represents a segment similarity that is a similarity between a feature Finput and a feature Fi,j. For features represented by vectors for example, the similarity calculating unit 106 calculates, as a segment similarity, a correlation value of the feature vectors. For features represented by histogram, the similarity calculating unit 106 calculates a segment similarity by histogram intersection. As such, the similarity calculating unit 106 can use related-art methods for calculating a similarity of any features, depending on the methods for calculating features.
Equation (2) is used to calculate an object similarity Sinput,j by normalizing the segment similarity with an overall reliability. For a partially hidden region, Equation (2) provides a more stable similarity than Equation (1).
As shown in Equations (1) and (2), a segment similarity is multiplied by reliabilities of the respective segment regions of both human regions to be compared, and obtained values for all the segment regions are summed up. In this way, an object similarity of the entire human region is calculated. Specifically, when the reliability is low, a segment similarity of corresponding segment regions is weighted less than the overall object similarity. Accordingly, for example, even when a hidden region is added, calculation of an object similarity is less affected, so that stable matching is achieved when a hidden region exists.
With the hidden region, even if such calculation can be less affected by weights, a similarity of features (a segment similarity) is lowered highly likely, causing reduction in accuracy of an overall similarity (object similarity). For example, when a person whose entire body is visible and a person whose only upper body is visible are compared, segment similarity of a portion corresponding to the hidden lower body is low. This lowers the object similarity, causing matching failure.
In this case, by evaluating a similarity based on a non-hidden portion (upper body) alone, stable matching can be achieved when a hidden region exists. For example, a similarity may be calculated using Equations (3) to (5) below:
where min(A,B) denotes a smaller value of A and B, D is an assembly of segment regions d having reliabilities cinput,d and cj,d that are at least equal to or greater than a threshold T, and num(D) denotes an element count of the assembly D.
By calculating an object similarity in this way, a segment region(s) having low reliabilities, such as a hidden region, are excluded from the assembly D, and not taken into account for calculation of the object similarity. This stabilizes matching when a hidden region exists.
After the object similarity is calculated using Equations (1) and (2), or (3) to (5), the determining unit 107 determines whether the object similarity is greater than a predetermined threshold (Step S1508). If the object similarity is greater than the threshold (Yes at Step S1508), the determining unit 107 determines that a person indicated by an object ID=i matches a person in the input search object image data (Step S1509).
If the object similarity is not greater than the threshold (No at Step S1508), the determining unit 107 determines that the person indicated by the object ID=i does not match the person in the input search object image data (Step S1510).
As such, in the object detecting apparatus according to the first embodiment, by comparing an aspect ratio of an extracted human region with a reference value, determination is made as to whether a search object, i.e., a person, is partially hidden. Then for the hidden portion, a low extraction reliability is calculated, and the sum of segment similarities weighted by the reliability can be found as an object similarity. Using the object similarity thus calculated, a matching relationship between human regions can be determined. Accordingly, even when the search object is partially hidden, the search object can be searched properly by collating images to each other.
An object detecting apparatus according to a second embodiment of the present invention extracts a region corresponding to a whole person (human region) by extracting regions of respective body parts, i.e., constituting elements of a person (constituting regions).
The second embodiment differs from the first embodiment regarding functions of the extracting unit 1602, the reliability calculating unit 1603, and the region dividing unit 1604. Other structures and functions are the same as those of the object detecting apparatus 100 according to the first embodiment, shown in the block diagram of
The extracting unit 1602 extracts constituting regions that correspond to respective body parts of a person, from input image data. For example, the extracting unit 1602 extracts each of the constituting elements that correspond to nine body parts: head, left shoulder, right shoulder, chest, left arm, right arm, abdomen, left leg, and right leg. As an extraction method, any related-art methods may be used, such as a method for extracting a constituting element by comparing each body part with its collation pattern.
The reliability calculating unit 1603 calculates a reliability of each segment region that is an extracted constituting region. The reliability calculating unit 1603 calculates, for example, a similarity between each body part and its collation pattern, as a reliability of each segment region. When the region dividing unit 1604 (described later) subdivides each constituting region, the reliability calculating unit 1603 calculates a reliability of each division unit serving as a segment region. In this case, the reliability calculating unit 1603 may calculate a similarity between a constituting region before subdivided and the collation pattern, as a reliability of each segment region after subdivided.
The region dividing unit 1604 divides an extracted human region into a plurality of segment regions. Most simply, the region dividing unit 1604 divides the human region into segment regions that correspond to constituting regions extracted for respective body parts. Alternatively, the region dividing unit 1604 may divide the human region into segment regions made by subdividing each constituting region.
Referring to
To begin with, the receiving unit 101 receives an input of learning image data (Step S1701). The extracting unit 1602 extracts a human region, by extracting each body part one by one, from the learning image data thus input (Step S1702). Specifically, by extracting each constituting element corresponding to each body part of a person, the extracting unit 1602 extracts a human region including such constituting regions.
The reliability calculating unit 1603 calculates a similarity between a constituting region and a collation pattern, as a reliability of each body part (Step S1703). Then, the region dividing unit 1604 divides the human region into segment regions (Step S1704). As described, the region dividing unit 1604 divides the human region into segment regions in units of constituting elements, or divides the human region into segment regions so as to subdivide each constituting region. When the region dividing unit 1604 subdivides each constituting region, the reliability calculating unit 1603 allocates, to subdivided segment regions, a reliability of a corresponding constituting region before subdivided.
The feature calculating process and the feature storing process at Steps S1705 to S1706 are the same as those at Steps S306 to S307 performed by the object detecting apparatus 100 according to the first embodiment, and the descriptions thereof are not repeated.
Referring to
A image data input process, a human region extraction process, a reliability calculating process, and a region dividing process at Steps S1801 to S1804 are the same as those at Steps S1701 to S1704 of the learning process shown in
Further, a feature calculating process and a matching determination process at Steps S1805 to S1809 are the same as those at Steps S1506 to S1510 performed by the object detecting apparatus 100 according to the first embodiment, and the descriptions thereof are not repeated.
As such, according to the second embodiment, a matching relationship between search objects can be determined using the same methods used in the first embodiment, except a method for extracting a human region.
Referring to
On the contrary,
As such, the object detecting apparatus according to the second embodiment can use the same search process as in the first embodiment, even using a method for extracting a region of each body part constituting a person. Accordingly, even when a search object is partially hidden, the search object can be searched properly by collating images to each other.
As an alternative method for extracting a region of each body part, motion capture technology may be applied.
Specifically, the extracting unit 1602 extracts a region of each body part by performing matching with the human model, using edge information and silhouette information of an image. The reliability calculating unit 1603 calculates, as a reliability of each segment region, a matching score with respect to each body part model, obtained as a result of the matching with the human model. The region dividing unit 1604 divides the human region into body parts, according to the matching with the human model.
Referring to
The object detecting apparatus according to the first or the second embodiment includes: a controlling device such as a central processing unit (CPU) 51; a memory device such as a read only memory (ROM) 52 or a RAM 53; a communication interface (I/F) 54 that performs communication connecting to a network; an external memory device such as a hard disk drive (HDD) or a compact disc (CD) drive device; a displaying device such as a display; an input device such as a keyboard or a mouse; and a bus 61 that connects these units. The object detecting apparatus has a hardware structure using a general computer.
An object search computer program implemented by the object detecting apparatus according to the first or the second embodiment is provided as being written in installable or executable file format, and recorded in a computer-readable medium, such as a compact disk read only memory (CD-ROM), a flexible disk (FD), a compact disk recordable (CD-R), a digital versatile disk (DVD) or a memory. The computer readable medium which stores an object search computer program will be provided as a computer program product.
Alternatively, the object search computer program implemented by the object detecting apparatus according to the first or the second embodiment may be provided as being stored in a computer connected to a network such as the Internet and downloaded via the network. Further, the object search computer program implemented by a object detecting apparatus according to the first or the second embodiment may be provided or distributed via a network such as the Internet.
Further, the object search computer program according to the first or the second embodiment may be provided as being installed in advance in a ROM or the like.
The object search computer program implemented by a object detecting apparatus according to the first or the second embodiment is configured as a module including the above-described units (the vide input unit, the extracting unit, the reliability calculating unit, the region dividing unit, the feature calculating unit, the similarity calculating unit, and the determining unit). Further, to realize hardware, the object search computer program is read out from the storage medium and implemented by the CPU 51 (processor), so that the above units are loaded and generated in a main memory device.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2007-244294 | Sep 2007 | JP | national |