This application is a continuation application filed under 35 U.S.C. 111(a) claiming the benefit under 35 U.S.C. 120 and 365(c) of a PCT International Application No. PCT/JP2003/000735 filed Jan. 27, 2003, in the Japanese Patent Office, the disclosure of which is hereby incorporated by reference.
1. Field of the Invention
The present invention generally relates to target object appearing position display apparatuses, and more particularly to a target object appearing position display apparatus which displays an appearing position of a target object within a movie in a form suited for classifying and analyzing features of the movie.
2. Description of the Related Art
Recently, there are increased demands by industries and the like typified by television stations to classify and analyze various movie files stored thereby, and increased demands by individuals to classify and analyze various movie files that are obtained by taking images by a video recording.
The target object appearing position display apparatus according to the present invention displays the features of each movie in an easily understandable manner. Hence, the target object appearing position display apparatus of the present invention is suited for use when searching a movie file similar to a particular movie file, classifying the genre of each movie file, analyzing a relationship of the commercials and television programs with respect to the rating, and analyzing a common style of moves directed by a certain movie director, and the like.
As methods of detecting a particular object, various methods have been proposed to detect the face of a person, a horse, a car or the like. One example of such a method of detecting a particular object is proposed in Henry Schneiderman, “A statistical approach to 3D object detection applied to faces and cars”, CMU-RI-TR-00-06, 2000. In the following description, a description will be given of the method of detecting the face of the person, which is often used, as an example of the method of detecting the particular object.
Various methods have been proposed to detect the appearing position of the face of the person from a still image or a movie. One example of such a detection method is proposed in Ming-Hsuan Yang and Narendra Ahuja, “Face detection and gesture recognition for human-computer interaction”, Kluwer Academic Publishers, ISBN: 0-7923-7409-6, 2001. Most of such detection methods display a detection result by adding a rectangular or circuit mark 2 at the position of a detected face 1, as shown in
On the other hand, when detecting the face from a movie, the face is often detected in units of still images called frames which are the basic elements forming the movie. For example, such a detection method is proposed in Sakurai et al., “A Fast Eye Pairs Detection for Face Detection”, 8th Image Sensing Symposium Lecture Articles, pp. 557-562, 2002. Hence, when detecting the face from the movie, the detected result is also displayed by adding the rectangular or circular mark at the position of the detected face in each frame corresponding to the still image, similarly as in the case where the face is detected from the still image.
In the case of the method which displays the detected result by adding the mark at the position of the detected face that is detected in units of frames, the detected result amounts to a considerably large number of frames since 30 frames generally exist per second and there are approximately 1800 frames even in the case of a movie amounting to only approximately one minute. Accordingly, there was a problem in that the operation of visually confirming the position where the face is detected with respect to each frame of the detected result is an extremely troublesome and time-consuming operation for the user. In addition, there was a problem in that it is difficult to comprehensively grasp information related to the movie as a whole, such as information indicating the position where the face was most frequently detected in the movie as a whole.
Accordingly, it is a general object of the present invention to provide a novel and useful target object appearing position display apparatus in which the problems described above are suppressed.
Another and more specific object of the present invention is to provide a target object appearing position display apparatus which enables an appearing position of a target object within a movie to be easily grasped.
Still another object of the present invention is to provide a target object appearing position display apparatus comprising an object detecting part configured to detect one or a plurality of specified target objects, with respect to each frame of a movie; a position data holding part configured to hold position data of each target object that is detected; an appearing frequency computing part configured to compute an appearing frequency of each target object for each position; and an intensity display part configured to display an appearing frequency of each target object by an intensity value of a corresponding pixel. According to the target object appearing position display apparatus of the present invention, it is possible to easily grasp the appearing position of the target object within the movie.
Other objects and further features of the present invention will be apparent from the following detailed description when read in conjunction with the accompanying drawings.
Each of the functions of the object detecting part 11, the position data holding part 12, the appearing frequency computing part 13, the intensity display part 14 and the image arranging part 15 may be realized by hardware or software. In the following description, it is assumed for the sake of convenience that each of the functions of the object detecting part 11, the position data holding part 12, the appearing frequency computing part 13, the intensity display part 14 and the image arranging part 15 is realized by software, that is, by a processor such as a CPU of a known information processing apparatus such as a general purpose computer. The known information processing apparatus need only include at least a CPU and a memory.
The target object may be the face of a person, horse, car or the like. But for the sake of convenience, the following description will be given by referring to the case where the target object is the face of a person.
The object detecting part 11 receives each frame of the movie as an input, and detects and outputs the position of the target object if the target object, that is, the face of the person, appears within the frame. The movie that is input to the object detecting part 11 may be picked up by a known imaging means such as a video camera and input in real-time or, may be stored in a known storage means such as a disk and a memory and input by being read out from the storage means. The target object may be specified by the user by a known method using an input device such as a keyboard and a mouse of the known information processing apparatus, for example. Various methods have been proposed to detect the face as the target object, and one example of the method detects the face in the following manner.
First, in order to determine face candidates within the image, color information is used and a color satisfying a certain threshold value is extracted as the skin color. Then, with respect to each extracted pixel having the skin color, a distance is computed between a feature value that is obtained by subjecting a luminance value to a Gabor transform and a feature value that is obtained by subjecting a luminance value of an eye portion of a face image registered in advance within a dictionary, and the pixel is extracted as the eye if the distance is less than or equal to a preset threshold value. The face candidate including the extracted eye is detected as the face. For example, such a detecting method is proposed in L. Wiskott et al., “Face recognition by elastic bunch graph matching”, PAMI vol. 19, no. 7, pp. 775-779, 1997. This proposed detecting method uses the feature values that have been subjected to the Gabor transform, but there is a method of performing a pattern matching simply using the luminance values as the feature values.
Of course, the face detection method itself and the target object method detection itself are not limited to the detection method described above.
The object detecting part 11 may not detect the target object with respect to all frames of the movie, and detect the target object only with respect to the frames satisfying a prespecified condition that is specified in advance. For example, the condition may be specified so that the target object is detected only with respect to the frames extracted at predetermined intervals or, only with respect to the frames having a large change in the image feature value. By detecting the target object only with respect to the frames satisfying the prespecified condition, it is possible to reduce the time required to detect the target object.
Furthermore, the object detecting part 11 may detect a single target object or detect a plurality of target objects such as the face of the person and a car. When detecting a plurality of target objects, it is possible to employ a method of detecting the plurality of target objects in one process or, a method of detecting the plurality of target objects by carrying out a process of detecting the single object a plurality of times.
A step S6 decides whether or not all pixels have been repeated, and if the decision result is YES, the process returns to the step S3. If the decision result in the step S6 is NO, a step S7 decides whether or not the number of faces that are the target objects specified by the user is less than or equal to a threshold value. The process returns to the step S6 if the decision result in the step S7 is NO. On the other hand, if the decision result in the step S7 is YES, a step S8 extracts the skin color from the frame image, and a step S9 subjects the luminance value to a Gabor transform with respect to each pixel having the extracted skin color. A step S10 computes a distance (or error) between the feature value that is obtained by this Gabor transform and a feature value that is obtained by subjecting the luminance value of the eye portion of the face image that is registered in advance in a dictionary to a Gabor transform. A step S11 decides whether or not the computed distance is less than or equal to a threshold value. If the decision result in the step S11 is NO, a step S12 judges that other than the face is detected, and the process returns to the step S6. On the other hand, if the decision result in the step S11 is YES, a step S13 judges that the face is detected, and the process returns to the step S6. The process shown in
The position data holding part 12 holds the position coordinate of the target object (face) detected by the object detecting part 11. In this case, since a plurality of pixels corresponding to one detected target object exit in most cases, it is possible for example to hold all coordinates of the pixels corresponding to the one detected target object or, hold the coordinate of a prespecified location (for example, center of gravity) of a region of the detected target object or, hold the coordinate of a prespecified part (for example, eye, nose, mouth and the like when the face is the target object) of the region of the detected target object. Accordingly, in a case where it is desirable to eliminate the effects of an extremely small object, for example, it is possible to prespecify a condition that the coordinate will not be held with respect to an object that is smaller than a predetermined size. In addition, by specifying a particular portion within the object, it is possible in general to more accurately grasp the position of the target object spanning a plurality of pixels.
It is possible to hold the position data for each prespecified condition in order to independently count the appearing frequency of the detected target objects for each prespecified condition by separating the detected target objects according to the each prespecified condition. For example, it is possible to specify each direction of the target object as the condition, and hold the position data for each direction. Various other conditions may be specified, such as the kind of the target object and the size of the target object.
The appearing frequency computing part 13 computes the appearing frequency of the target object at each coordinate, based on the position data held in the position data holding part 12. The appearing frequency of the target object may be computed by a computing process including the following steps ST1 through ST5.
Step ST1: Initialize an appearing frequency C of the target object at each coordinate to 0.
Step ST2: Count (increment) the appearance number C of the target object at each coordinate.
Step ST3: Compute a sum total S of the appearance numbers C of the target object.
Step ST4: Compute an appearance rate R=C/S by dividing the appearance number C of the target object at each coordinate by S.
Step ST5: Compute an intensity value I=R×255 by multiplying a maximum luminance value (255 in the case of 8 bits) to the appearance rate R.
It is possible to independently count the appearing frequency of the detected target objects for each prespecified condition by separating the detected target objects according to the each prespecified condition. For example, it is possible to specify each direction of the target object as the condition, and count the appearing frequency for each direction. Various other conditions may be specified, such as the kind of the target object, the size of the target object and the appearance number of the target object.
The step S35 decides whether or not all pixels of the target object have been repeated, and if the decision result is YES, the process returns to the step S31. If the decision result in the step S35 is NO, a step S36 counts the appearance numbers C of the target object stored in the memory. In addition, a step S37 obtains the sum total S of the appearance numbers C of the target object from S=S+C, and the process returns to the step S35. The process shown in
The intensity display part 14 displays the intensity value of each coordinate computed by the appearing frequency computing part 13 on a display part of the general purpose computer described above, as luminance information (intensity value) of the corresponding pixel of the intensity image that is output. In the case where the appearing frequency computing part 13 carries out the steps ST1 through ST5 described above, the intensity display part 14 carries out an intensity display process including the following step ST6.
Step ST6: Set the intensity value I of the target object at each coordinate as the luminance information [0 to 255] of the corresponding pixel of the intensity image.
It is possible to independently make the intensity display of the appearing frequency of the detected target objects for each prespecified condition by separating the detected target objects according to the each prespecified condition. More particularly, when considering each direction of the target object, for example, it is possible to independently make the intensity display with respect to the target objects for each prespecified condition, by preparing three kinds of intensity displays, namely, an intensity display representing the appearing frequency in a rightward direction, an intensity display representing the appearing frequency in a frontward direction, and an intensity display representing the appearing frequency in a leftward direction.
It is possible to independently make the intensity display of the appearing frequency of the detected target objects in a different color for each prespecified condition, by separating the detected target objects according to the each prespecified condition and assigning a different color for each prespecified condition. More particularly, when considering each direction of the target object, for example, it is possible to independently make the intensity display with respect to the target objects in a different color for each prespecified condition, by displaying the appearing frequency in the rightward direction by a red intensity, displaying the appearing frequency in the frontward direction by a blue intensity, and displaying the appearing frequency in the leftward direction by a green intensity.
The image arranging part 15 automatically classifies and arranges the displayed intensity images based on an arbitrary image feature value. A method of automatically classifying and arranging not only the intensity image but also the image in general is proposed in Susumu Endo et al., “MIRACLES: Multimedia Information Retrieval, Classification, and Exploration System”, In Proc. of IEEE International Conference on Multimedia and Expo (ICME2002), 2002, for example. According to this classifying and arranging method, a specified image feature value (color, texture, shape, etc.) is automatically extracted from each image, a distance between the extracted image feature values of an arbitrarily selected image and each image is computed, and images similar to the selected image (having a small distance) are displayed in the order of similarity. Of course, the image arranging part 15 may classify and arrange the intensity image by methods other than the classifying and arranging method described above. In addition, since the intensity image that is displayed by the intensity display part 14 is a partial set of general images, it is possible to input the intensity image to the image arranging part 15 which employs a known classifying and arranging method.
The position data holding part 12 and the appearing frequency computing part 13 can independently count the appearing frequency of the detected target objects for each prespecified condition by separating the detected target objects according to the each prespecified condition, and the intensity display part 14 can use a different color for each prespecified condition when displaying the appearing frequency as the intensity information. Accordingly, it is possible to grasp the appearing frequency for each direction of the target object, for example. It is possible to grasp the appearing frequency in more detail by specifying the direction or orientation of the target object by the condition, and as the condition, for example, the appearing frequency of the target object facing the rightward direction may be set higher than the appearing frequency of the target object facing the frontward direction.
On the other hand, if the decision result in the step S53 is YES, a step S56 sorts all images according to the order (size) of the distance. In addition, a step S57 displays all images on the display part in the sorted order, and the process ends. In this embodiment, the image arranging part 15 carries out the classification and arrangement based on the intensity images displayed on the display part by the intensity display part 14. However, it is of course possible to directly classify and arrange the intensity images output from the intensity display part 14 and to display the sorted result on the display part.
The image arranging part 15 automatically classifies and arranges the intensity information obtained from the intensity display part 14 based on the image feature value, and displays the sorted result. For this reason, by classifying and arranging the intensity images similar to a certain intensity image A, for example, it is possible efficiently search the movies having a similar appearing frequency of the target object as the movie corresponding to the certain intensity image A. Moreover, it is also possible to grasp the number of intensity images having similarities to a certain extent, and to grasp the similarities of a group of intensity images that are arranged locally.
The intensity display part 14 displays the appearing frequency of the target object by the intensity value of the pixel at each position, based on the information computed in the appearing frequency computing part 13. In other words, since the intensity value representing the appearing frequency of the target object is automatically computed based on the detection result of the target object with respect to each frame of the movie, it is possible to represent the appearing position of the target object appearing within the movie by an intensity distribution. For this reason, by viewing the intensity distribution corresponding to each movie, the user can easily and visually grasp the appearing position of the target object within the movie. Hence, the image arranging part 15 may be omitted in a case where the user carries out the classification and arrangement of the intensity information by viewing the intensity distribution corresponding to each movie.
The following are examples of the cases where it is necessary to grasp the appearing position of the face of the person, as the target object, with respect to the movie.
C1) Genre Classification of Dynamic Image: With respect to a large number of movies, the present invention can gather movies having similar intensity information and classify such movies in the same genre. For example, in the case of the group of movies of the educational program having many scenes in which one lecturer is lecturing at the center of the screen, the intensity display that is made has a high intensity in the vicinity of the center of the screen for each movie, as shown in
C2) Analysis of Commercials and Programs: As a method of analyzing the commercials and the programs, it is possible to utilize the present invention for finding the features and knowledge common to the movies, with respect to the commercials and the programs having a high rating. For example, the knowledge common to the movies may be that “the appearing frequency of the face is high at the center portion of the screen for the commercials having a high rating”.
C3) Analyzing Style: The intensity display of the present invention may be utilized as one feature value that is used to extract the knowledge common to a group of films (movies) directed by a certain movie director. For example, the knowledge common to the group of movies may be that “there is a strong tendency for the face of the person to appear uniformly in the entire screen for the movies shot by a director Y”.
Therefore, because the present invention can represent the appearing frequency of a particular target object appearing within the movie by an intensity distribution (intensity image), it is possible to easily grasp the appearing position of the target object appearing within the movie. In addition, since the intensity image reflecting the appearing tendency of the target object is obtained by employing the present invention, it is possible to automatically classify and arrange the intensity image by inputting the obtained intensity image to the image arranging part. Hence, it is possible, for example, to grasp the similarity and the like related to the appearing tendency of the target object with respect to a plurality of movies.
Thus, according to the present invention, it is possible to classify the genre of the movies, analyze the commercials and the programs, and analyze the style by using, as a new point of view, the appearing position of the target object (for example, the face of the person) within the movie.
Further, the present invention is not limited to these embodiments, but various variations and modifications may be made without departing from the scope of the present invention.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP03/00735 | Jan 2003 | US |
Child | 11077195 | Mar 2005 | US |