The present invention relates to an image conversion apparatus, a camera, a video system, an image conversion method and a recording medium including a program recorded therein that convert an image in such a way that a line-of-sight direction is changed.
There has been known a technique that converts an image captured by a camera into an image captured from a virtual viewpoint different from a viewpoint of capturing the original image.
PTL 1 discloses a technique that forms, using the abovementioned image conversion technique, an overview image of a wide range by using a plurality of images captured by a plurality of cameras. In this technique, the plurality of images captured by the plurality of cameras installed at different positions are changed to images captured from the same viewpoint, and the plurality of images are combined into a single image to form the above-described overview image of a wide range.
PTL 2 discloses a technique that is used when an image captured by a main camera includes a blind zone and that fills the blind zone of the image with an image by using the above-described image conversion technique. In this technique, a sub-camera which is different from the main camera is used to capture an image of a range for filling the blind zone. The viewpoint of the captured image is then converted to have the same viewpoint as the viewpoint of the main camera, and the image in the range overlapping with the dead zone is cut out and combined.
Let us consider a case where a video conference is held, for example, while participants are positioned in three directions of a table and captured by a camera from one remaining direction, and the captured image is output and displayed in a different site for the video conference. In this case, the participant positioned in front of the camera is in a proper display state in which the participant faces the front in the image. However, the participants positioned in the left and right directions of the camera are in a slightly improper display state for the video conference in which the participants turn away from the camera in the image. In this case, the image of all the participants is supposed to be displayed properly if the left region of the image is turned left and the right region of the image is turned right.
However, the image conversion technique that changes a viewpoint according to the related art is a technique that converts an image in a way as if a single whole image is viewed from a single virtual viewpoint. Therefore, the image conversion technique, for example, cannot handle a situation where the right side of the image is turned 30 degrees and the left side of the image is turned 20 degrees.
An object of the present invention is to provide an image conversion apparatus, a camera, a video system, an image conversion method and a recording medium including a program recorded therein that make it possible to flexibly handle a case where one image includes a plurality of regions desired to be turned in different directions and thus to convert the image in a desired way.
An image conversion apparatus according to an aspect of the present invention includes: a region dividing section that divides one input image into a plurality of regions; and an image conversion section that converts an image of at least one of the plurality of regions into an image captured from a virtual viewpoint different from a viewpoint of capturing the input image, the plurality of regions being obtained by dividing the one input image by the region dividing section.
According to the present invention, it is possible to convert an image in such a way that one input image is divided into a plurality of regions and a line of sight is changed for each of the regions. Therefore, even in a case where one image includes a plurality of regions desired to be turned in different directions, it is possible to flexibly handle the case.
Hereinafter, embodiments of the present invention will be described based on the drawings.
Camera apparatus 1 in the embodiment includes: image input section 12 that includes a camera lens and an imaging device and that inputs image data of a captured image; region dividing section 13 that divides the captured image into a plurality of regions; and image conversion section 14 that performs image conversion on the image of each of the regions obtained by the dividing process. In addition, camera apparatus 1 includes: combining section 15 that performs image combining and output; and shape model database 16 that stores therein a three-dimensional shape model of a human face, or shape models of a wall, a desk and the like in a room. Camera apparatus 1 further includes: region setting section 17 that sets a region to be divided; and Line-of-sight setting section 18 that performs setting of the image conversion.
Region setting section 17 has a plurality of operation buttons and sets a plurality of regions in the captured image upon reception of an input operation of a user. For example, optional line segments in the captured image are output and displayed by an input operation of the user, and region setting section 17 sets a range surrounded by the line segments or an outline of the captured image as one region. Alternatively, a plurality of points are input in the captured image by the input operation of the user and region setting section 17 obtains regions divided by dashed lines having the plurality of input points as apexes, or a Voronoi region having the plurality of input points as generators. Then, region setting section 17 may set the Voronoi region as a plurality of regions. The information of the set region is transmitted to region dividing section 13 and Line-of-sight setting section 18.
Upon reception of information on the set regions from region setting section 17, region dividing section 13 performs a dividing process of the image data supplied from image input section 12 so that the captured image is divided into the set regions. Then, region dividing section 13 generates image data for each of the regions and transmits the image data to image conversion section 14.
Line-of-sight setting section 18 has a plurality of operation buttons and sets a line-of-sight direction of a conversion result (direction corresponding to the line of sight after image conversion) with respect to the plurality of regions in the captured image by receiving the input operation of the user. For example, line-of-sight setting section 18 displays an arrow for each of the set regions of the captured image and makes the direction of the arrow changeable in a three-dimensional manner by the input operation of the user. Then, line-of-sight setting section 18 sets the finally determined direction of the arrow as the line-of-sight direction of conversion target. The setting information of the line-of-sight direction for each of the regions is transmitted to image conversion section 14.
Image conversion section 14 performs image conversion processing on the image data of each of the regions so that the image having an optical axis of the camera lens as the line-of-sight direction is converted into an image which is viewed from the line-of-sight direction set for each of the regions. In a case of wide-angle capturing, the line-of-sight directions of the left and right regions of the image are slightly changed from the optical axis of the camera lens according to a viewing angle. Therefore, the information indicating in what shape and how the image before the conversion is placed in a three-dimensional manner is required to accurately perform the image conversion in which the above-described line-of-sight directions are changed. Image conversion section 14 is configured to perform simple image conversion by using the three-dimensional face model for only a face portion of a person requiring accuracy, and handling the other portions as models arranged along a uniform plane. The direction of the uniform plane can be obtained by, for example, extracting a line segment or a polygon which can specify a direction of in the image of each of the regions by an image analysis and estimating an average direction of the line segments or polygons. In addition, image conversion section 14 may be configured to cause the user to input the direction of the plane.
Image conversion section 14 searches for a face portion of a person in the image of each of the regions by matching processing or the like, and when the face portion of the person is present, further specifies the direction of the face from eyes, a nose, a mouth, a contour and the like. Then, image conversion section 14 associates the face image with a three-dimensional shape using the three-dimensional shape data of shape model database 16. In addition, the other portions are associated with the plane in which the direction is estimated as described above. By performing the processing, image conversion section 14 can associate each pixel of the image data with a coordinate point in a virtual three-dimensional mapping space. Next, image conversion section 14 performs processing of converting the image mapped in the virtual three-dimensional space into an image captured from a newly set line-of-sight direction, based on the setting information supplied from line-of-sight setting section 18 to generate image data after conversion. By performing such image processing, image conversion section 14 can convert the image into an image which is viewed from the line-of-sight direction newly set from the line-of-sight direction of the camera during capturing the image of each region so that the face portion of the person is relatively accurately converted and the other portions are roughly converted (to be converted as the plane models).
Combining section 15 arranges the image data of a plurality of regions supplied from image conversion section 14 in the same arrangement as the arrangement when the dividing process is performed, and combines the image data into one piece of image data. The image data is converted into display data for outputting and displaying the image data and is output. When the image data of the plurality of regions obtained by the dividing process is individually output and displayed on a plurality of displays, combining section 15 may be configured to convert the image data of the plurality of regions obtained by the dividing process into display data suitable for the individual displays and output the converted data.
The video conference system in
Next, the operation of the above-described video conference system will be described.
First, the user sets a region through region setting section 17. Here, as illustrated in
Next, the user sets a line-of-sight direction to be converted by line-of-sight setting section 18. For example, as illustrated in
Next, image conversion section 14 performs image conversion processing of rotating the image of A plane S1, the three-dimensional shapes of the face portions of persons P1 and P2, and the image of the background wall of the persons by rotational angle “−θA” with respect to the image data of the left region. In addition, image conversion section 14 performs image conversion processing of rotating the image of C plane S3, the three-dimensional shapes of the face portions of persons P5 and P6, and the image of the background wall of the persons by rotational angle “θC” with respect to the image data of the right region. Image conversion section 14 transmits the image data of the center region in which the line-of-sight direction is not changed to combining section 15 without any change.
Then, the image data of such a plurality of regions after conversion is combined by combining section 15 to generate an image as illustrated in
When the above-described combining processing is performed, combining section 15 may perform smoothing processing of smoothing boundaries between the images of the regions, or processing of matching a position of a characteristic object. For example, in the case of
Since combining section 15 makes the image data of each region converted by image conversion section 14 have the same size (shape) as the image data before conversion and combines the image data, combining section 15 selects a region to be used in the image data after conversion, and then, combines the image data. In the image data after conversion, when there is a portion in which the size is not sufficient, combining section 15 may be configured to perform right and left inversion (make a mirror image) on a neighboring image as image data used for the insufficient portion.
On the side of viewing the image transmitted from camera apparatus 1, as illustrated in
Then, the image data mainly including the corresponding image of each region is output and displayed on each display 21. When combining section 15 determines displays 21 of output and display destinations, combining section 15 may refer to the arrangement information of each display 21 and the position information of each division region in the image. Then, combining section 15 transmits the image data including the division regions corresponding to the arrangement information of corresponding displays 21 to each display 21.
By outputting the images from a plurality of displays 21, as illustrated in
For example, face direction detection section 19 starts processing of the flowchart illustrated in
When the processing starts, first, face direction detection section 19 acquires the captured image at this point from image input section 12 to detect a face portion of a person in the captured image by matching processing (Step J1: image search processing). As shown in
Next, face direction detection section 19 analyzes the contours of the detected face portions and the arrangement of eyes, a nose and a mouth to detect direction of each face (Step J2: direction detection processing). As illustrated in
Next, face direction detection section 19 categorizes the plurality of faces into groups (Step J3) based on the positions of the detected faces and the directions of the detected faces in the image. Specifically, face direction detection section 19 groups a plurality of faces in which a difference in the directions of the detected faces is within a predetermined range (for example, within 30°) and the positions of the detected faces are arranged in sequence. In the case of image G1, since a difference between the face directions of two persons P1 and P2 consecutively positioned from the left side is within 30°, and a difference between the face direction of person P3 who is the third person next to person P2 and the face direction of person P2 exceeds 30°, face direction detection section 19 categorizes the faces in detection frames f1 and f2 into a first group. By repeating the same processing, face direction detection section 19 categorizes the faces in detection frames f3 and f4 into a second group and categorizes the faces in detection frames f5 and f6 into a third group.
After arranging the faces into the groups, face direction detection section 19 divides the image into regions corresponding to each group (Step J4: region setting processing). The dividing process of a region can be performed by, for example, using a Voronoi division algorithm. That is, face direction detection section 19 performs dividing process of a region for each face using the center of each of the detected faces as a generator so that each point on the image belongs to the closest generator. Further, face direction detection section 19 combines the regions of the plurality of faces belonging to the same group to set the regions to regions R1 to R3 of the corresponding groups (refer to
After the dividing process of a region, face direction detection section 19 performs processing of determining a line-of-sight direction of conversion result for each group (Step J5: line of sight setting processing). As illustrated in
In Embodiment 2, the operations are performed in the same manner as in Embodiment 1 except the region setting of the image and new line-of-sight direction setting. In Embodiment 2, the input operation of the user required for the region setting and the new line-of-sight direction setting can be considerably reduced.
As described above, in camera apparatuses 1 and 1A according to Embodiments 1 and 2, and the video conference system, image conversion processing can be performed on one input image by dividing the image into a plurality of regions and converting the division images into images having different line-of-sight directions, respectively. According to this system, when one input image includes a plurality of subjects directed in various directions, the arrangement and the directions of the subjects are flexibly handled, and the image can be converted into an easily viewable image as a whole. Alternatively, the image can be converted into an image that has been deformed in a desired way in various aspects.
In the above-described embodiments, an example in which the video system of the present invention is applied to a video conference system has been described, but the configuration of the video system may be implemented in one digital still camera or one digital video camera. That is, the configuration to input an image, to divide an image into a plurality of regions, to perform image conversion for changing a line-of-sight direction, and to output and display an image may be included and performed in one apparatus. In addition, an apparatus obtained by removing image input section 12 from camera apparatuses 1 and 1A in the embodiments may be separately provided as an image conversion apparatus.
Further, in Embodiment 2, an example in which image conversion is performed by categorizing a plurality of faces into groups and changing a line-of-sight direction in each region of each group has been described. However, the image conversion section may be configured to set each of face detection frames f1 to f6 as an individual region and to individually perform image conversion (line-of-sight direction conversion) only on the face portions.
In addition, line-of-sight direction conversion may be performed on the background portions according to the angles of the faces. Since a camera image is used as input and there is no image data of lateral side and back side unlike CG in the line-of-sight direction conversion, a modest conversion (with a degree smaller than the degree of an expected conversion) may be suitable in some cases. In particular, when the conversion angle exceeds 30° C., the modest conversion may be performed.
In addition, line-of-sight setting section 18 does not necessarily change all the lines of sight. The line-of-sight setting section may be configured to convert a viewpoint of one region and to maintain the original viewpoints without performing viewpoint conversion for the other regions. As a result, the line-of-sight setting may be performed such that two lines of sight are relatively changed.
Further, region setting section 17, line-of-sight setting section 18, and face direction detection section 19 are not necessarily included in camera apparatus 1 (or camera apparatus 1A), and these functions may be provided through a network. For example, the video conference apparatus of the connection destination of the video conference system includes the image input section, the region dividing section, the image conversion section, the shape model DB, and the combining section. In addition, the video conference apparatus of the connection source includes the region setting section, the line-of-sight setting section, and the face direction detection section. Then, the video conference apparatus of the connection source may receive an image from the video conference apparatus of the connection destination through the network, perform region setting, line-of-sight direction setting and face detection, and transmit the results to the video conference system of the connection destination so that a desired image is obtained.
The configuration elements including region dividing section 13, image conversion section 14, combining section 15, region setting section 17, line-of-sight setting section 18, and face direction detection section 19, which have been described in the above-described embodiments, may be configured by hardware, or software which is implemented by a program executed by a computer. The program may be recorded in a computer readable recording medium. The recording medium may be a non-transitory recording medium such as flash memory and the like.
The disclosure of Japanese Patent Application No. 2011-161910, filed on Jul. 25, 2011, including the specification, drawings and abstract, is incorporated herein by reference in its entirety.
The present invention can be applied to a digital still camera, a digital video camera, and a video system which transmits or broadcasts video to a different place to allow for viewing the video.
Number | Date | Country | Kind |
---|---|---|---|
2011-161910 | Jul 2011 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/004504 | 7/12/2012 | WO | 00 | 1/24/2014 |