The present disclosure relates to an image generation method and an image generation apparatus, and more particularly to a three-dimensional image generation apparatus and a three-dimensional image generation method.
There are conventionally known techniques for acquiring distance distribution information, such as stereo cameras and light field cameras. The distance distribution information can be subjected to perspective projection transformation to obtain a point cloud. Furthermore, converting the point cloud into polygons makes it possible to generate a three-dimensional surface model having surfaces.
Acquiring an image (color distribution information) simultaneously with the distance distribution information makes it possible to generate three-dimensional (3D) data including texture information. Unlike an ordinary two-dimensional image, 3D data is advantageous in that it allows a user to enjoy an image from any viewpoint.
For example, Japanese Patent Application Laid-Open No. 2013-165475 discusses a method for generating a virtual viewpoint image in which the position of a virtual viewpoint and the focusing position is changed based on depth information calculated on the principle of a light field camera.
According to an aspect of the present disclosure, an image generation apparatus includes one or more memories storing instructions; and one or more processors that, upon execution of the stored instructions, are configured to operate as a first acquisition unit configured to acquire 3D data and a defective area of the 3D data, a second acquisition unit configured to acquire a movable range of a virtual camera, and a generation unit configured to generate mask data based on the movable range of the virtual camera and the defective area such that the defective area is not visible in the movable range of the virtual camera.
According to another aspect of the present disclosure, a control method for an image generation apparatus includes acquiring 3D data and a defective area of the 3D data, acquiring a movable range of a virtual camera, and generating mask data based on the movable range of the virtual camera and the defective area such that the defective area is not visible in the movable range of the virtual camera.
Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Exemplary embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
In three-dimensional (3D) data that is created using a stereo camera, a light field camera, or the like, a subject may not fit entirely in the field of view but may be partially cut off. If the subject is cut off in this way, the 3D data thereof is partially defective. When the 3D data is combined with a 3D background to enhance a sense of immersion, the subject may appear to be floating in the air, for example, causing a feeling of strangeness.
In view of this, the present disclosure is directed to providing an image generation apparatus that can reduce the feeling of strangeness in the case of 3D data in which data of a subject is partially missing.
Hereinafter, an image processing apparatus that generates a virtual viewpoint image according to a first exemplary embodiment of the present disclosure will be described with reference to
The image processing apparatus 200 is a server computer, for example, and includes a processing unit 201, a 3D data acquisition unit 202, an image generation unit 203, a defective area detection unit 205, and a mask data generation unit 206. The user interface 300 is a personal computer, for example, that is electrically connected to the image processing apparatus 200, and includes a display unit 301 and an operation unit 302.
The processing unit 201 included in the image processing apparatus 200 controls the entire image processing apparatus 200 using programs and data stored in advance, thereby implementing the functional units illustrated in
The display unit 301 provided in the user interface 300 includes a liquid crystal display, for example, and displays a graphical user interface (GUI) or the like for the user to operate the image processing apparatus 200. The operation unit 302 includes a keyboard, a mouse, a joystick, and/or the like, for example, and upon receipt of operations by the user, inputs various instructions to the display unit 301. A configuration in which the operation unit 302 is integrated with the display unit 301, such as a liquid crystal display equipped with a touch panel, may also be employed.
Next, an example of a generation method of 3D data according to the present exemplary embodiment will be described based on a configuration diagram of an imaging apparatus 400 illustrated in
The imaging element 502 has a structure in which tens of millions of pixels 503 that are photoelectric conversion elements are arranged in a lattice pattern, for example. The pixels are provided with color filters that transmit light of specific wavelengths such as red, green, and blue, and are arranged in a Bayer array, for example, whereby the color distribution information can be acquired.
One pixel 503 includes a microlens 504, a first photoelectric conversion unit 505, and a second photoelectric conversion unit 506, and obtains different pieces of light information by the two photoelectric conversion units 505 and 506 arranged in a horizontal direction (X direction) based on their positions. From the received light information on all the pixels, a first image that is configured as luminance distribution of light received by the first photoelectric conversion unit 505 and a second image that is configured as luminance distribution of light received by the second photoelectric conversion unit 506 are obtained.
An incident surface of the imaging element 502 and light receiving surfaces of the first photoelectric conversion unit 505 and second photoelectric conversion unit 506 are in a substantially Fourier conjugate relationship due to the microlens 504. Thus, the light receiving surfaces of the photoelectric conversion units and an exit pupil of the imaging optical system 501 are in a substantially optically conjugate relationship. Since the position distribution of the exit pupil and the position distribution of the light receiving surfaces of the photoelectric conversion units correspond to each other, providing the two different photoelectric conversion units makes it possible to separately receive light rays that have passed through different pupil regions.
The first image and the second image indicate pieces of luminance distribution information that are obtained by the light rays having passed through the different pupil regions.
A light ray that forms an image on imaging surface (ideally) enters the same point on the imaging element 502 regardless of a position on the pupil through which the light ray passes. However, for a defocused light ray, the position at which the light ray enters the imaging element 502 varies depending on the position on the pupil through which the light ray passes. In other words, an image shift occurs according to an amount of defocus.
An amount of the image shift can be calculated by stereo matching between the first image and the second image, for example. With respect to a patch of a small area in one image, matching is performed relative to the other image in a direction of an epipolar line, and an amount of the image shift is determined by identifying a position with the highest correlation. By using the amount of the image shift calculated in this manner and a focal length and a focus position obtained from imaging information, a position is converted into a world coordinate system to generate 3D data.
In addition, the data captured by the imaging unit 401 is grouped by object (type of subject) by an object detection unit 402. At this time, if the object is a person, a facial organ detection unit 403 detects facial feature points and calculates coordinates of facial organs. These pieces of information are linked to the 3D data and stored in a recording medium.
Next, the 3D data according to the present exemplary embodiment will be described.
Now, coordinate axes will be described. As illustrated in
The captured image only includes the upper body of the subject, with the lower body and below being cut off. In this case, similarly, the 3D data on the object is generated only for the upper body, with the lower body and below being defective.
The 3D data on the object according to the present exemplary embodiment is not limited to the subject imaged by the imaging apparatus as illustrated in
Next, a flow of processing of generating mask data from 3D data executed by the image generation apparatus 100 according to the present exemplary embodiment will be described with reference to the flowchart in
In step S101, 3D data is acquired. In the acquired 3D data, a single object is recognized and is cut out. The 3D data includes face recognition information so that it can be determined whether the object is a person.
In step S102, the processing unit 201 acquires information on a movable range of the virtual camera. The information on the movable range of the virtual camera is set in advance in the application, and indicates the range in which the virtual camera can move. This setting can be changed by performing an operation on the user interface 300.
In step S103, the processing unit 201 executes the defective area detection unit 205 to detect a defective area of the object in the 3D data. First, the defective area detection unit 205 performs labeling using deep learning on the image data of the object to estimate the overall shape of the object. Then, the defective area detection unit 205 compares the estimated shape with the 3D data to calculate a boundary surface where the object in the 3D data is defective, thereby determining a defective area.
When the defective area is detected, in step S104, the processing proceeds to mask data generation processing. In the mask data generation processing, the processing unit 201 generates mask data so that the defective area is not visible from the viewpoint of the virtual camera.
When the above-described steps are completed, the processing ends.
The above is an outline of the mask data generation processing in the image processing apparatus 200.
Next, the movable range of the virtual camera in virtual space will be described.
In the present exemplary embodiment, a z direction is set to an optical axis direction of the imaging camera, but the z direction may be set to a forward direction of the virtual camera in world coordinates in a 3D space. The z direction is defined as the vertical direction in the world coordinates, and a y direction is defined as the horizontal direction in the world coordinates that is perpendicular to the z direction.
Since the position of the virtual camera can be restricted by installing the virtual camera in a specific spatial area and generating the virtual viewpoint image data as described above, it is possible to reduce the feeling of strangeness about the image that is viewed from a virtual viewpoint. The user may be allowed to freely set the movable range of the virtual camera based on the object.
Next, a method for obtaining a defective area of an object in 3D data will be described with reference to the flowchart in
First, in step S201, the processing unit 201 performs labeling of the input 3D data for each object. For example, labeling is performed using deep learning such that semantically categorized labels are attached. Alternatively, the user may specify an object and input a label. In the present exemplary embodiment, when an image as illustrated in
In step S202, the processing unit 201 estimates the overall shape of a target object based on the label. For example, based on a model shape of a representative model prepared in advance, the processing unit 201 corrects the posture and shape in the image data of the target object using deep learning, and estimates the overall shape of the object. In the present exemplary embodiment, the overall shape is estimated as illustrated in
Then, in step S203, the processing unit 201 compares the target object in the 3D data with the overall shape estimated as above. In step S204, the processing unit 201 calculates a defective part of the object, and determines the boundary between the defective part and the object as the defective area. In the present exemplary embodiment, a thick-line portion illustrated in
In this manner, in the present exemplary embodiment, object detection and shape estimation are performed using deep learning to determine the defective area. Alternatively, the user may directly input and specify the defective area via the operation unit 302. In this case, the user interface 300 is provided with a function of accepting input of the defective area, and the defective area is determined by a user operation.
Next, a method for generating mask data will be described with reference to the flowchart in
First, in step S301, the processing unit 201 acquires z coordinate value distribution of the target object from the 3D data, and calculates a position z′ that is a position a predetermined value away from a minimum value of the acquired z coordinate value distribution. A Z′ plane, which is a mask data generation plane, is determined to be at a position where z=z′. In the present exemplary embodiment, the predetermined value is set to 50 mm in calculating the position z′.
Next, in step S302, the processing unit 201 draws lines connecting coordinates in the movable range of the virtual camera obtained as described above with coordinates in the defective area. In step S303, the processing unit 201 calculates an intersection point area P where these lines intersect with the Z′ plane.
Then, in step S304, the processing unit 201 generates mask data M so as to be superimposed on the intersection point area P, and the processing ends.
In the above-described exemplary embodiment, the intersection point area P on which the mask data M is superimposed is determined by calculating intersection points for all the coordinates in the movable range of the virtual camera. Alternatively, in order to speed up the processing, the number of search coordinates for the movable range may be decreased. For example, a similar effect can be obtained by setting an outer peripheral surface of the movable range as a search range and determining the intersection point area P.
If the mask data M is sufficiently large relative to the defective area, the search range can be further reduced. For example, if the mask data M is sufficiently large relative to the defective area in the −y direction and ±x direction, the search range can be limited as described below.
Specifically, first, when a direction in which the defective area exists with respect to an object center is defined as a defective direction, one surface of the outer peripheral surface in the opposite direction to the defective direction is set as a search surface in the movable range of the virtual camera, as illustrated in
If the search surface in the movable range of the virtual camera is flat, the search range can be further reduced.
Specifically, as illustrated in
Conversely, as illustrated in
Although the 3D data used in the above-described exemplary embodiment is a still image, the effects of the present disclosure can be obtained similarly in a case where the 3D data is a moving image. In this case, it is not desirable that the mask data M moves between frames. Thus, in the present exemplary embodiment, a method for generating one piece of mask data M that can support all frames will be described. As a first modification example of the first exemplary embodiment, processing in the case where the 3D data is a moving image will be described with reference to the flowchart in
In the present exemplary embodiment, the 3D data is handled as 3D moving image data in which a plurality of consecutive pieces of still image 3D data is collected. In step S401, upon acquisition of the 3D data, the processing unit 201 breaks down the 3D data into frames and processes them. The movable range of the virtual camera is the same as that in the first exemplary embodiment, so description thereof will be omitted.
Next, the difference in the detection of a defective area from the first exemplary embodiment will be described with reference to the flowchart of
First, in step S501, any one frame of the 3D data is labeled because it is sufficient to make it clear what the target object is. Next, in step S502, the processing unit 201 estimates the overall shape of the target object with respect to all the frames based on the label. The method of shape estimation is the same as that in the first exemplary embodiment, so description thereof will be omitted.
Subsequently, in step S503, the 3D data of each frame is compared with the estimated overall shape. In step S504, the boundary surfaces of defective parts are calculated in all the frames, and these surfaces are all integrated to obtain an outer peripheral surface of the entire defective part, which is determined as the defective area.
Similarly, mask data is generated by taking all the frames into consideration. The differences from the first exemplary embodiment will be described with reference to the flowchart in
In step S601, in order to determine a Z′ plane, z position distribution in all the frames is calculated, and the Z′ plane is set at a position a predetermined distance away from a minimum value of the z position distribution. Then, similar processing as that in the first exemplary embodiment is performed to generate mask data M.
Generating the mask data M for the defective area in all the frames in this manner makes it possible to mask the defective area while keeping the mask data M stationary in the moving image.
In the first exemplary embodiment, as an example, the processing of acquiring information about the virtual camera viewpoint and information about the defective area from any piece of 3D data with a defective area and generating a mask in a space between the virtual camera viewpoint and the defective area is described. In a second exemplary embodiment, combining 3D data with a defective object and 3D data with only a defect-free object will be discussed. Specifically, an example of a method of arranging an object with a defective area in the combining will be described. The same reference numerals as those of the first exemplary embodiment are given to components identical to those of the first exemplary embodiment, and description thereof will be omitted.
The image generation apparatus 100 includes a 3D data combining unit 206, and the processing unit 201 combines 3D data based on combining position information determined by the position determination unit 207 described above.
When the plurality of pieces of 3D data is input, in steps S702 and S703, the processing unit 201 acquires the movable range of the virtual camera and detects the defective area. These operations in the present exemplary embodiment are similar to those in the first exemplary embodiment, and thus description thereof will be omitted.
Next, in step S704, the processing unit 201 determines the position of the object A having the defective area.
First, the processing unit 201 calculates a hidden area that is hidden by the object B from coordinates of each virtual camera.
After the calculation of the hidden area, the processing unit 201 determines the position of the object A. First, the processing unit 201 determines the orientation of the object A. For example, if the object A is a person, the orientation of the object is set to a frontal orientation relative to the virtual camera based on the model estimated in step S703. Alternatively, if the object A is imaged by an imaging camera, the optical axis direction of the imaging camera may be set. Alternatively, a user interface (UI) operation unit that allows the user to input the orientation may be provided, and the user may input the orientation through the UI operation unit.
Next, the processing unit 201 arranges the object A so that the defective area is in the hidden area in the orientation determined as above. At this time, the processing unit 201 calculates the volume of the hidden area except for the defective area so that the non-defective area is located in the hidden area as less as possible, and determines a position where the non-defective area is the smallest.
After determining the arrangement position in this manner, the processing unit 201 executes processing stored in the 3D data combining unit 206 to combine the 3D data.
Although the desirable exemplary embodiments of the present disclosure have been described above, the present disclosure is not limited to these exemplary embodiments, and various modifications and changes are possible within the scope of the gist of the present disclosure.
According to the present disclosure, since 3D mask data can be appropriately generated for a defective area of 3D data, it is possible to provide 3D data with a few feelings of strangeness.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2023-175023, filed Oct. 10, 2023, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2023-175023 | Oct 2023 | JP | national |