This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2014-194489, filed on Sep. 24, 2014; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a system and a method for specifying an image capturing unit, and a non-transitory computer readable medium thereof.
In a surveillance camera system to which many surveillance cameras are connected, in order to confirm a video for a specific surveillance camera, or in order to adjust a parameter such as an angle of view, a device for selecting the surveillance camera is well known.
For example, in the technique disclosed in JP Pub. No. 2009-4977, a position coordinate of a portable device is measured at a plurality of base stations by receiving a radio field intensity from the portable device, and a surveillance camera corresponding to the measured position is selected from a plurality of surveillance cameras. Furthermore, in the video distributing system disclosed in JP Pub. No. 2004-7283, without forcing a user to perform operations (troublesome for the user), a video including the user's desired scene can be acquired and distributed with high resolution.
However, physical positions of all surveillance cameras need to be previously measured and recorded. Furthermore, if these positions cannot be previously measured, a bar code to specify the positional relationship needs to be used.
According to one embodiment, an image capturing unit-specifying system includes a portable device, a plurality of second image capturing units, an extraction unit, a calculation unit, and a specifying unit. The portable device includes a first image capturing unit that captures a first image of a person, and a sending unit that sends the first image. The plurality of second image capturing units captures second images. The extraction unit extracts a first feature of the first image and a second feature of each of the second images. The calculation unit calculates a similarity based on the first feature and the second feature of the each of the second images. If the similarity for a second image is larger than a predetermined threshold, the specifying unit specifies a second image capturing unit that has captured the second image, among the plurality of second image capturing units.
Various embodiments will be described hereinafter with reference to the accompanying drawings.
The portable device 100 includes a first camera (the first image capturing unit) 10 able to capture a still image (hereinafter, it is called “an image”), and the sending unit 11 to send the image to the server 200. The portable device 100 is connected to a network, and able to communicate with the second image capturing unit (explained afterwards). For example, a user is capturing the user oneself (himself or herself) by the first camera 10. This captured image (the first image) is sent to the server 200. Hereafter, an example that the user oneself is captured will be explained. However, a capturing target is not limited to the user oneself. For example, a plurality of users may be mutually captured. Namely, a person who is capturing a first image with the portable device 100 may be different from a user who is captured.
In the server 200, the receiving unit 6 receives images (captured by the first camera 10). The received images are sent to the extraction unit 3. Furthermore, the received images may be temporarily preserved in the storage unit 7.
A plurality of second cameras (the second image capturing units) 20a˜20X is located. For example, a plurality of surveillance cameras is located at one specific building or a plurality of specific buildings, and each surveillance camera is capturing an inside or a circumference of the building. More specifically, this camera can output a video signal. It may be a network camera connected via a network, or may be a plurality of analogue cameras to send a composite video signal. For example, a security system in which a plurality of cameras is cooperated may be used. Here, an example that the second camera 20a is capturing the user oneself will be explained. However, a capturing target is not limited to the user oneself. For example, a plurality of users may be mutually captured. Namely, a person who is capturing may be different from a user who is captured. The second camera 20a is capturing the user. This captured image (the second image) is sent to the extraction unit 3. Furthermore, identification information to correspond the second image with an ID of the second camera 20a is sent to the specifying unit 5. The second image may be temporarily preserved in the storage unit 7.
The extraction unit 3 extracts a first feature of the first image and a second feature of the second image. For example, a feature is extracted from each of face regions in the first image and the second images as the first feature and the second feature. The extracted features are sent to the calculation unit 4.
The calculation unit 4 calculates a similarity between the feature of the first image and the feature of the second image. The calculated similarity is sent to the specifying unit 5. For example, in case of the feature of the face region, a face of the first image is matched (compared) with a face of the second image.
If the calculated similarity is larger than a predetermined threshold, the specifying unit 5 specifies the second camera 20a which has captured the second image corresponding to the second feature. The predetermined threshold may be determined by previous training. By using the identification information (acquired from the second camera 20a) and the calculated similarity, the specifying unit 5 specifies the second camera 20a from a plurality of second cameras 20a˜20x.
The output unit 8 outputs a result specifying the second camera 20a. Information to correspond this result with an ID of the second camera 20a may be newly created from the plurality of second cameras 20a˜20x, and stored into the storage unit 7.
First, the first camera 10 and the second camera 20a capture the user respectively (S101). As a target captured by the first camera 10 and the second camera 20a, a person including a face region is preferred. For example, after an image of the user's face region is captured by the first camera 10 of the portable device 100, an image including the user's face region is captured by the second camera 20a while the user is opposing against a capturing direction of the second camera 20a.
As the second camera 20a, a fixed surveillance camera is supposed. For example, the case that a person image including an operator's face is utilized will be explained by referring to
A type of the camera is not limited to the surveillance camera. Image-capturing by the second camera 20a may be performed at timing suitable for the user by using a remote controller. Furthermore, while the image capturing unit-specifying system 1 is being operated, the second camera 20a may capture an image at fixed intervals. Here, the fixed intervals are determined such as one image-capturing per one second, or one image-capturing per one minute. While the second camera 20a is capturing an image at fixed intervals, until the extraction unit 3 starts acquiring images, the captured images are temporarily preserved into the storage unit 7.
Next, the face region is extracted from respective images captured by the first camera 10 and the second camera 20a (S102). After the extraction unit 3 acquires a first image captured by the first camera 10, the extraction unit 3 acquires a second image captured by the second camera 20a from images preserved in the storage unit 7. Here, among times when respective images (preserved in the storage unit 7) are captured by the second camera 20a, a time when the second image is captured by the second camera 20a is the nearest to a time when the first image is captured by the first camera 10.
For example, detection of the face region is performed as follows. As to a pair of each pixel region on the acquired image, a difference (Harr-like feature) between pixel intensities of the pair is calculated. By comparing the difference with a threshold (determined by previous training), it is decided whether the face region is included in a notice region. Moreover, in order to evaluate correlation (co-occurrence) among a plurality of facial features, by combining threshold-comparison processing of a plurality of differences (Joint Harr-like feature) between pixel intensities, the face region can be decided with higher accuracy. By deciding a target region while changing a position and a size of the target region in the image, the position and the size of the face region may be detected.
In above-explanation, the face region was described as the example. For example, in case of a whole body of a person, a whole bode detector is previously trained and the corresponding body region is extracted by the detector. By using the face or the whole body of the person, a region to be matched can be necessarily specified. As a result, even if an image-capturing direction and a background of the first camera 10 are respectively different from those of the second camera 20a, the first camera 10 and the second camera 20a can be easily utilized.
Next, the calculation unit 4 calculates a similarity between features of respective face regions (S103). In case of utilizing a target region (such as a face image) explicitly decidable, the face recognition technique can be utilized. For example, if the image includes a whole body of a person, an equivalent region is extracted from the image, and a similarity thereof is calculated using the template matching technique. As a result of similarity-calculation, a second image having the similarity larger than a predetermined threshold is selected. An ID of the second camera 20a which has captured the selected second image is set to a candidate (Hereinafter, it is called “a selection candidate”).
Next, the specifying unit 5 specifies the second camera based on the similarity-calculation result (S104). In an example of
As mentioned-above, in the image capturing unit-specifying system 1, by matching an image sent from the portable device 100 with respective images captured from the surveillance cameras, one surveillance camera can be specified without measuring positions of the surveillance cameras. Specifically, by using information acquired from person-video information, the user can easily specify a position of this/her desired camera.
<In Case of Using Moving Images>
If dynamic images can be acquired by the first camera 10 and the second cameras 20a˜20x, information used by the extraction unit 3 and the calculation unit 4 may be features extracted from the moving images.
For example, feature vector (a set of features) using a moving and a posture of a specific part of the user's body may be utilized.
In this case, by detecting the specific part from the moving images acquired and by tracing a position of the specific part in time series, a coincidence degree between a locus of the position and a predetermined pattern can be used.
(The First Modification)
In the image capturing unit-specifying system 1 of the first embodiment, the second camera 20 is specified via the server 200. However, as shown in
(Hardware Component)
A program executed by the system 1 is provided by previously being installed into the ROM. Furthermore, this program may be provided by being stored into a computer-readable storage medium (such as a CD-ROM, a CD-R, a memory card, a DVD, a flexible disk (FD)) as a file having an installable format or an executable format. Furthermore, this program may be provided by being stored into a computer (connected to a network such as Internet) and by being downloaded via the network.
In the first embodiment, the program executable by the system 1 comprises modules to realize above-mentioned each unit on the computer. As an actual hardware, for example, the control device 601 reads the program from the external storage device 603 to the storage device 602, and executes the program. As a result, above-mentioned each unit is realized on the computer.
As mentioned-above, in the first embodiment, by matching an image sent from the portable device with respective images captured from the surveillance cameras, one surveillance camera can be specified without measuring positions of the surveillance cameras. Specifically, by using information acquired from person-video information, the user can easily specify a position of this/her desired camera.
While certain embodiments have been described, these embodiments have been presented by way of examples only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
For example, as to each step of the flow chart shown in
Number | Date | Country | Kind |
---|---|---|---|
2014-194489 | Sep 2014 | JP | national |