The present technology relates to an image creation system, an image creation method, and a program, and particularly relates to a technical field using a free viewpoint image and a three-dimensional (3D) image.
There has been known a technique of generating a free viewpoint image (Volumetric Capturing) corresponding to an observation image from an arbitrary viewpoint on a three-dimensional space on the basis of three-dimensional information representing a subject imaged from multiple viewpoints on the three-dimensional space. This is a technology for generating an image of a free viewpoint without a position constraint from multi-view images. The free viewpoint image is called “Free View”, “Volumetric”, or the like. The free viewpoint image is useful as, for example, a replay image of a sport relay.
For example, Patent Document 1 discloses a technique related to generation of a camerawork that can be regarded as a movement trajectory of a viewpoint.
In addition, Patent Document 2 discloses an image processing technique for intelligibly conveying play contents of table tennis.
Furthermore, in recent years, in play of soccer, basketball, and the like, there has been known a technique for estimating a posture and position of a player/referee, a position/rotation of a ball, and the like from a designated field from an image by a dedicated camera or information by a sensor (acceleration sensor or GPS sensor) attached to a player or a ball as an EPTS (electronic performance and tracking systems).
A reproduced image of a play based on EPTS data can also be generated using computer graphics (CG).
These various image processing techniques exist, but it is considered that new image representation can be executed by more effectively using a live-action image and a CG image.
Therefore, the present disclosure proposes a technology capable of producing more various images using a live-action free viewpoint image and a CG image.
An image creation system according to the present technology includes: an estimation information generation unit that generates estimation information regarding a subject on the basis of at least one of a captured image or sensor information; a free viewpoint image generation unit that generates a first three-dimensional model, which is a three-dimensional model of the subject, on the basis of a plurality of pieces of captured image data obtained by simultaneously capturing images from a plurality of viewpoints, and generates a free viewpoint image, which is an image of an arbitrary viewpoint of the subject, using the first three-dimensional model; and a three-dimensional image generation unit capable of generating a three-dimensional image on the basis of the estimation information and a second three-dimensional model, which is a virtual three-dimensional model of the subject.
Various image content generation is performed by switching and combining a free viewpoint image based on a live-action image and a three-dimensional image using a virtual three-dimensional model using estimation information.
Hereinafter, embodiments will be described in the following order.
Note that in the present disclosure, an “image” includes both a moving image and a still image. Although moving image content creation will be mainly described as an example, the image to be created is not limited to a moving image, and may be a still image or a slide show using a plurality of still images. Furthermore, the “image” refers to an image actually displayed on a screen, but the “image” in a signal processing process or a transmission path until being displayed on the screen refers to image data.
The image creation system 300 includes a free viewpoint image generation unit 3, an estimation information generation unit 4, and a three-dimensional image generation unit 5.
The free viewpoint image generation unit 3 performs processing of generating a three-dimensional model of the subject on the basis of a plurality of pieces of captured image data obtained by simultaneously capturing images from a plurality of viewpoints, and generating a free viewpoint image that is an image of an arbitrary viewpoint for a subject using the three-dimensional model.
For example, a plurality of imaging devices 10 images an area of a subject in a sports venue or the like, for example, a stadium where a game is being played, from various positions. The free viewpoint image generation unit 3 can receive images captured by these imaging devices 10 and generate a live-action image including the free viewpoint image.
When the free viewpoint image is generated, a three-dimensional model of the subject is generated from a multi-view captured image, and an image of an arbitrary viewpoint of the subject is generated using the three-dimensional model. In the present disclosure, a free viewpoint image generated by using a three-dimensional model based on live-action is included in a live-action image in order to be distinguished from a CG image described later.
Therefore, the live-action image referred to in the present disclosure includes a captured image itself by an imaging device 10, an image (clip) in which a free viewpoint image is inserted in a part of a moving image of the captured image, a clip including only a free viewpoint image, or the like. Note that the “clip” refers to an image of a certain scene created by cutting out or further processing from the recorded image.
An estimation information generation unit 4 generates estimation information regarding the subject on the basis of the captured image. In this case, various types of information are generated from the captured image obtained by the imaging device 20.
The estimation information is, for example, information regarding a position, posture, and movement of a person (player and the like) or an object (ball and the like) as a subject. For example, EPTS data to be described later corresponds to the estimation information.
For example, a plurality of imaging devices 20 images a sports venue or the like from various positions. The estimation information generation unit 4 can receive images captured by these imaging devices 20 and perform image analysis processing to generate estimation information.
Furthermore, for example, one or more acceleration sensors and/or one or more GPS sensors may be attached to a ball used in sports, a uniform worn by a person performing sports, a shoe worn by a person, a wristwatch, or the like. The estimation information generation unit 4 receives acceleration information and position information obtained by these sensors 29, and can generate estimation information on the basis of these pieces of information.
The three-dimensional image generation unit 5 can generate a three-dimensional image on the basis of the estimation information generated by the estimation information generation unit 4 and the three-dimensional model of the subject.
For example, the three-dimensional image generation unit 5 can use a virtual three-dimensional (3D) model of a subject (player or the like) created by the CG to create a 3D image of a game by the CG using a position, posture, movement, and the like of each player as estimation information.
Note that the three-dimensional image generation unit 5 can also generate a 3D image using a 3D model based on the live-action image generated by the free viewpoint image generation unit 3.
Such an image creation system 300 including the free viewpoint image generation unit 3, the estimation information generation unit 4, and the three-dimensional image generation unit 5 as described above is a system capable of outputting a live-action image including a free viewpoint image and a 3D image. In particular, as the 3D image, a 3D image using a 3D model generated on the basis of a 3D model by CG or a live-action image can be generated.
In
The output image generation unit 6 generates an output image on the basis of the free viewpoint image generated by the free viewpoint image generation unit 3 and the three-dimensional image generated by the three-dimensional image generation unit 5.
That is, the image creation system 301 is a system that can input, select, or combine a live-action image including a free viewpoint image and a 3D image using a 3D model by CG or a 3D model generated on the basis of the live-action to generate and output an output image. For example, the output image is distributed or broadcast as image content.
The output image generation unit 6 integrally handles the live-action image including the free viewpoint image and the 3D image using the 3D model by the CG or live-action, so that various images can be distributed and the like as the output image.
A more specific configuration example corresponding to the image creation system 300 or 301 having the configuration of
The volumetric system 100 corresponds to the free viewpoint image generation unit 3 in
The EPTS 200 has functions of the estimation information generation unit 4 and the three-dimensional image generation unit 5 in
The configuration of
The plurality of imaging devices 10 is configured as a digital camera device including an imaging element such as a charge coupled device (CCD) sensor or a complementary metal-oxide-semiconductor (CMOS) sensor, and the like, for example, and obtains a captured image as digital data. In the present example, each imaging device 10 obtains a captured image as a moving image.
In the present example, each imaging device 10 captures an image of a scene in which a competition such as basketball, soccer, a golf, and the like is being held, and each imaging device is arranged in a predetermined direction at a predetermined position in a stadium where the competition is held. The number of the imaging devices 10 is not particularly specified, but the number of the imaging devices 10 may be at least two or more in order to enable generation of a free viewpoint image. The number of imaging devices 10 is increased and a target subject is imaged from more angles, so that accuracy of 3D model restoration of the subject can be improved, and an image quality of the free viewpoint image can be improved.
Furthermore, an event to be a free viewpoint image generation target is not limited to sports competition such as a basketball competition and the like and includes variety of events.
In addition, the plurality of imaging devices 20 is configured as a digital camera device similarly including an imaging element such as a CCD sensor, a CMOS sensor, and the like, and obtains a captured image as digital data. In the present example, each imaging device 20 obtains a captured image as a moving image and supplies the captured image to the EPTS 200.
The plurality of imaging devices 20 is also arranged at various positions in a basketball court or the like, for example, so that captured images from various viewpoints can be obtained.
Each imaging device 10 and each imaging device 20 are synchronized by a synchronization unit 30. That is, each imaging device 10 and each imaging device 20 image each frame at the same timing. This is because a timing relationship between each frame of the image obtained by the volumetric system 100 and the EPTS data obtained by the EPTS 200 is matched.
A configuration of the volumetric system 100 will be described.
The recording unit 11 records each of the captured images by the plurality of imaging devices 10 and supplies a part or all of the captured images to the FV render 12. That is, the recording unit 11 functions as a video server for generating a free viewpoint image.
The FV render 12 performs free viewpoint image generation using the captured image by the imaging device 10 supplied from the recording unit 11. Note that “FV” indicates “Free View”, that is, the FV render 12 performs rendering as free viewpoint image generation.
The FV render 12 generates a 3D model of the subject from the multi-view captured image obtained by each imaging device 10. An image of an arbitrary viewpoint for the subject is generated using the 3D model.
A camera path designation information CPD is input to the FV render 12, for example, in response to designation of a camera path by an operator's operation or automatic control.
Here, the camera path is information including at least information indicating a movement trajectory of the viewpoint in the free viewpoint image. For example, in a case of creating a free viewpoint image in which the position, the line-of-sight direction, and the angle of view (focal length) of the viewpoint are changed with respect to the subject for which the 3D model has been generated, the camera path information includes parameters necessary for defining a movement trajectory of the viewpoint, a change mode of the line-of-sight direction, a change mode of the angle of view, and the like.
In addition, tagging and scene designation may be performed by an operator's operation or automatic control. The tag mentioned here is, for example, information indicating a specific time point (frame) on a moving image, and is information given to a timing of a specific scene such as a shoot scene in basketball. The operator can perform a tagging operation at necessary timing, for example, such as start, end, shoot, foul, and serve, and so on in various sports, and the tag information is managed by a tag management unit 35. For example, a time stamp of tagging is stored.
The tag management unit 35 manages the tag and can set a specific section using the tag and output scene designation information SC when the operator designates the scene. For example, the scene designation information SC is information for designating a start timing and an end timing as one scene.
The FV render 12 is configured as an information processing device that actually performs a process of creating a free viewpoint image, for example, a free view (FV) clip to be described later, according to the scene designation information SC and the camera path designation information CPD in the scene.
Details of FV rendering by the FV render 12 will be described later.
The live-action image including the free viewpoint image such as an FV clip generated by the FV render 12 and so on is stored in a storage 16. Note that depth information and camera path information of each subject such as a player and a ball are also stored in the storage 16 in association with the FV clip. The depth information is information regarding a distance in a depth direction from the imaging device 10, that is, information indicating an anteroposterior relationship of each subject from the viewpoint position (position of the imaging device 10).
In addition, the 3D model of the live-action generated for generating the free viewpoint image by the FV render 12, particularly, the 3D model (for description, referred to as a “live-action player model”) based on the live-action of the player who is a subject person in this example is stored in the storage 15.
Note that, in
The EPTS 200 will be described.
The recording unit 21 records each captured image by the plurality of imaging devices 20 and supplies each captured image to the EPTS data generation unit 22.
The EPTS data generation unit 22 performs the analysis processing on each captured image, generates the EPTS data individually, integrates all the individual EPTS data, and generates the EPTS data as a whole. The EPTS data includes, for example, information regarding a position of a player or a ball at each frame timing, a posture of the player and the like, a rotation speed or a rotation direction of the ball, and so on.
Furthermore, the EPTS data generation unit 22 may generate the EPTS data by using not only the captured image but also information obtained by the sensor 29, for example, information from an acceleration sensor embedded in a ball or a GPS sensor attached to a uniform of the player.
The EPTS data generation unit 22 can generate, as the EPTS data of the entire game, for example, information that can determine the position and posture of all the players participating in the game at each time point, the position and situation of the ball at each time point, and the like.
The EPTS data generated by the EPTS data generation unit 22 is stored in the storage 23 and can be referred to.
For example, the 3D render 27 may generate images during the game by the CG with reference to the EPTS data.
Furthermore, when the FV render 12 performs FV rendering, since the position, posture, and the like at each time point of a player or the like can be more accurately recognized by referring to the EPTS data, a free viewpoint image with higher accuracy can be generated.
The 3D render 27 performs rendering to generate a 3D image by CG using the 3D model.
As the 3D model used by the 3D render 27, a 3D model of each player (referred to as “CG player model” for description) is stored in the storage 24, and a 3D model of the background (referred to as “CG background model” for description) is stored in the storage 25.
The CG player model and the CG background model may be created in advance and stored in the storages 24 and 25, for example, before a game to be recorded.
With using the CG player model and the CG background model, information regarding the position and posture of each player is obtained with reference to the EPTS data, so that the 3D render 27 can generate an image representing a player, a ball, and the like in a game as a 3D image by CG.
For example, a clip as a moving image by CG is generated. Since the clip is generated by the CG, a clip including a free viewpoint image that is not limited to the arrangement of the imaging device 10 can be also generated.
That is, the 3D render 27 can generate a clip as a game reproduction video by a CG image using the EPTS data or an FV clip by a CG including a free viewpoint video.
Note that the 3D render 27 can also generate a 3D image using the live-action player model stored in the storage 15. For example, when a live-action player model for a certain player is generated, the 3D render 27 can switch the image of the player that has been generated using the CG player model to the image generated using the live-action player model.
The 3D images such as clips and the like generated by the 3D render 27 are stored in storage 28. In addition, depth information and camera path information are also stored in association with the clip. The camera path information is information regarding a virtual viewpoint position when a clip by CG is created.
The 3D display unit 34 in the figure illustrates a display device capable of 3D display. In a case where the 3D display unit 34 is present, the 3D image stored in the storage 28 is supplied to the 3D display unit 34, whereby a 3D video can be displayed.
On the other hand, in a case of considering outputting a 2D image as distribution or broadcasting, the 2D render 31 is provided. The 2D render 31 can receive a live-action image, for example, an FV clip from the storage 16, and can receive a 3D image, for example, a game reproduction clip by CG or an FV clip by CG, from the storage 28.
Then, the 2D render 31 selects or combines the live-action image or the 3D image including the input free viewpoint image to generate a 2D image for distribution or broadcasting. That is, the processing as the output image generation unit 6 in
In addition, the 2D render 31 can not only simply select or combine the live-action image and the 3D image, but also combine additional images or perform an image effect on the basis of the EPTS data.
The 2D render 31 reads the depth information and the camera path information together with the FV clip from the storage 16, and uses the read depth information and camera path information as reference values at the time of image processing for the FV clip.
In addition, the 2D render 31 also reads the depth information and the camera path information together with the clip by the CG from the storage 28, and uses them as reference values at the time of image processing on the clip by the CG.
In order for the 2D render 31 to refer to the EPTS data, the EPTS data stored in the storage 23 is supplied to the 2D render 31 through the coordinate transformation unit 26.
The coordinate transformation unit 26 transforms the EPTS data, which is information in a 3D space, into values on two-dimensional plane coordinates. The 2D render 31 generates a two-dimensional image, and the two-dimensional image is obtained by dropping a three-dimensional space into a plane viewed from a certain viewpoint position. Therefore, there is a need to convert the EPTS data indicated in the three-dimensional space into two-dimensional information from a certain viewpoint position. Therefore, the camera path designation information CPD is supplied to the coordinate transformation unit 26, and coordinate transformation is performed on the basis of the viewpoint position defined by the camera path.
With use of the coordinate-transformed EPTS data, the 2D render 31 can superimpose an additional image on the position specified by the EPTS data or perform an image effect on the 2D image.
For example, in the present example, the operator can perform an operation of instructing image effects or additional image combination. In response to this operation, effect designation information EF is input to the 2D render 31. The 2D render 31 can perform, according to the input information, image effect processing or additional image combination processing at a predetermined position determined by the EPTS data in the 2D image plane.
Note that the effect designation information EF may be supplied to the 3D render 27, and an image effect or an additional image may be combined when the 3D image is generated by the 3D render 27.
As described above, the 2D image generated by the 2D render 31 through processing such as selection and combination of the live-action image and the 3D image, and further combination of additional images is stored in the storage 32. Then, the 2D image is read from the storage 32, supplied to a 2D display unit 33, and displayed as a 2D image.
Note that the processing function as described above as the 2D render 31 may be executed in the 3D render 27 or may be executed in the FV render 12.
The same portions as those in
In the example of
Although an example in which the imaging devices 10 and 20 are distinguished is illustrated in the drawing, the present invention is not limited thereto, and a captured image of one imaging device may be used for both free viewpoint image generation and EPTS data generation.
The EPTS data generation unit 22 performs EPTS data generation processing, and in this case, also has a function as the FV render 12. That is, the information processing device as the EPTS data generation unit 22 in this example also executes processing as the FV render 12.
The EPTS data generated by the EPTS data generation unit 22 is stored in the storage 23 and can be referred to by the 3D render 27.
The live-action image including the free viewpoint image generated by the FV render 12 is stored in the storage 16. In addition, the live-action player model generated by the FV render 12 is stored in the storage 15 and can be referred to by the 3D render 27.
Other configurations are similar to those in
With the configuration of
A configuration of the information processing device 70 used in the configuration of
Furthermore, the information processing device 70 can be configured as, for example, a dedicated workstation, a general-purpose personal computer, a mobile terminal device, and the like.
A CPU 71 of the information processing device 70 illustrated in
An image processing unit 85 is configured as a processor that performs various types of video processing. For example, the image processing unit 85 is a processor capable of performing any of 3D model generation processing, FV rendering, 3D rendering, 2D rendering, DB (Data Base) processing, image effect processing, image analysis processing, EPTS data generation processing, and the like.
The image processing unit 85 can be realized by, for example, a CPU, a graphics processing unit (GPU), a general-purpose computing on graphics processing units (GPGPU), an artificial intelligence (AI) processor, and the like that is separate from the CPU 71.
Note that the image processing unit 85 may be provided as a function in the CPU 71.
The CPU 71, the ROM 72, the RAM 73, the nonvolatile memory unit 74, and the image processing unit 85 are connected to one another through a bus 83. Furthermore, an input/output interface 75 is connected to the bus 83.
An input unit 76 including an operation element and an operation device is connected to the input/output interface 75.
For example, as the input unit 76, various types of operation elements and operation devices such as a keyboard, a mouse, a key, a dial, a touch panel, a touch pad, a remote controller, and the like are assumed.
A user operation is detected by the input unit 76, and a signal corresponding to the input operation is interpreted by the CPU 71.
Furthermore, a display unit 77 including a liquid crystal display (LCD), an organic electro-luminescence (EL) panel, or the like, and an audio output unit 78 including a speaker or the like are integrally or separately connected to the input/output interface 75.
The display unit 77 performs various displays as a user interface. The display unit 77 includes, for example, a display device provided in a housing of the information processing device 70, a separate display device connected to the information processing device 70, and the like.
The display unit 77 executes display of an image for various types of image processing, a moving image to be processed and the like on a display screen on the basis of an instruction of the CPU 71. Furthermore, the display unit 77 displays various types of operation menus, icons, messages and the like, that is, displays as a graphical user interface (GUI) on the basis of an instruction of the CPU 71.
For example, an operator and the like of the image creation systems 300 and 301 according to the embodiment can check an image and perform various operations using the display unit 77 and the input unit 76.
There is a case where the storage unit 79 including a hard disk, a solid-state memory and the like, and a communication unit 80 including a modem and the like is connected to the input/output interface 75.
The communication unit 80 executes communication processing through a transmission path such as the Internet or performs wired/wireless communication with various types of devices, and communication using bus communication and the like.
Furthermore, a drive 82 is also connected to the input/output interface 75 as necessary, and a removable recording medium 81 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately mounted.
By the drive 82, a data file such as an image file, various computer programs, and the like can be read from the removable recording medium 81. The read data file is stored in the storage unit 79, and images and sounds included in the data file are output by the display unit 77 and the audio output unit 78. Furthermore, the computer programs and the like read from the removable recording medium 81 are installed in the storage unit 79, as necessary.
In the information processing device 70, software can be installed through network communication by the communication unit 80 or the removable recording medium 81. Alternatively, the software may be stored in advance in the ROM 72, the storage unit 79 and the like.
Note that the storages 15, 16, 23, 24, 25, 28, and 32 illustrated in
The free visual-point image generated by the FV render 12 will be described.
First, the viewpoint of the free viewpoint image will be described with reference to
For example, the viewpoint is gradually moved in the direction of the arrow C from a state of
An output clip including an FV clip as the free viewpoint image will be described.
For example, a previous clip is an actual moving image in a section of time codes Tc1 to Tc2 by a certain imaging device 10 among the plurality of imaging devices 10.
Furthermore, the subsequent clip is, for example, an actual moving image in a section of time codes Tc5 to Tc6 in another imaging device 10.
It is normally assumed that the image of the previous clip is image data of the imaging device 10 at a viewpoint movement start time by the FV clip, and the image of the subsequent clip is image data of the imaging device 10 at a viewpoint movement end time by the FV clip.
Then, in this example, the previous clip is a moving image having a time length t1, the FV clip is a free viewpoint image having a time length t2, and the subsequent clip is a moving image having a time length t3. A reproduction time length of the entire output clip is t1+t2+t3. For example, the output clip for 5 seconds can have a configuration including a 1.5-second moving image, a two-second free viewpoint image, and a 1.5-second moving image, or the like.
Here, the FV clip is illustrated as a section of time codes Tc3 to Tc4, but this may or may not correspond to the number of frames of the actual moving image. That is, as the FV clip, there are a case where the viewpoint is moved in a state where a time of the moving image is stopped (a case of Tc3=Tc4) and a case where the viewpoint is moved without stopping the time of the moving image (a case of Tc3≠Tc4).
For description, the FV clip in a case where the viewpoint is moved in a state where the time of the moving image is stopped (referred to as “time freeze”) is referred to as a “still image FV clip”, and the FV clip in a case where the viewpoint is moved without stopping the time of the moving image (referred to as “free run”) is referred to as a “moving image FV clip”.
That is, this is a case of generating a free viewpoint image in which the viewpoint moves with respect to a still image including one frame which is the frame F82.
Meanwhile, the moving image FV clip is as illustrated in
That is, this is a case of generating the free viewpoint image in which the viewpoint moves with respect to the moving image including a plurality of frames from the frame F102 to the frame F302.
In
For example, the output clip including the FV clip is generated in this manner and used as an image to be broadcasted.
The free viewpoint image generation processing performed in the FV render 12 will be described with reference to
For example, the FV render 12 can perform generation by a VDP (View Dependent Player) method and generation by a VIDP (View InDependent Player) method as the generation of the free viewpoint image.
The VDP method is a method for generating a free viewpoint image by pasting a texture image according to a viewpoint, to 3D data generated through visual hull from captured image data of a plurality of viewpoints. In the VDP method, there is a need to prepare an image for each viewpoint, as a texture image.
The VIDP method is a method for generating a 3D model of a subject as polygon mesh data from the captured image data of the plurality of viewpoints and generating a texture image as a UV map texture so as to generate a free viewpoint image by computer graphics (CG) on the basis of the polygon mesh data and the UV map texture. Here, the UV map texture means two-dimensional data obtained by UV developing the 3D model by polygon meshes that is data indicating color information for each polygon (for example, triangle).
As illustrated in
Here, as the imaging device 10 used to generate the free viewpoint image, there may be an imaging device 10 used to obtain a captured image used to generate 3D data (hereinafter, referred to as “camera for subject sensing”) and an imaging device 10 used to obtain a texture image attached to the 3D data when the free viewpoint image is generated (hereinafter, referred to as “camera for texture”).
For example, it is considered that some of all the imaging devices 10 used to generate the free viewpoint image are used as the cameras for subject sensing, and the other imaging devices 10 are used as the cameras for texture. Alternatively, the camera for subject sensing and the camera for texture do not necessarily need to be different imaging devices 10, and the single imaging device 10 can be used as both of the camera for subject sensing and the camera for texture. Moreover, all the imaging devices 10 can be used as such dual-use cameras.
When the 3D data is generated, foreground extraction processing P1 is executed, using captured image data obtained by each imaging device 10 as the camera for subject sensing (hereinafter, referred to as “captured image data for sensing”), and silhouette image data is generated.
In the foreground extraction processing P1 in
Then, for these foreground images, for example, by generating image data in which the image area of the subject is “1” and the other areas are “0”, silhouette image data indicating the silhouette of the subject as illustrated in the lower part of
In
Here, the 3D data is not individually generated for each target subject, for example, each player and the like. In a case where a plurality of target subjects is captured in the field of view of the camera and the silhouette image data is data indicating silhouettes of the plurality of subjects, a single piece of 3D data indicating a three-dimensional image of the plurality of subjects is generated, according to the silhouette image data.
In
Specifically, in the FV generation processing P5, the free viewpoint image generation by the VDP method is performed on the basis of the 3D data, the captured image data of the texture camera, and the parameter data of the texture camera.
Furthermore, the 3D data is used to generate a 3D model that enables to generate the free viewpoint image by the VIDP method described above.
Specifically, according to 3D model generation processing P3 in
For reference,
Furthermore, the UV map texture described above is used to generate the free viewpoint image by the VIDP method. The UV map texture is generated based on the captured image data of the texture camera by the texture generation processing P4 illustrated in
In the texture generation processing P4 in the present example, the UV map texture is generated for each subject, in correspondence with that the polygon mesh data is generated for each subject in the 3D model generation processing P3.
In FV generation processing P6, free viewpoint image generation by the VIDP method is performed on the basis of the 3D model (polygon mesh data) of the subject obtained by the 3D model generation processing P3 and the UV map texture obtained by the texture generation processing P4.
Here, since the VDP method is a method for pasting a texture image prepared for each viewpoint, the VDP method has an advantage such that degradation in an image quality of a free viewpoint image can be suppressed even in a case where 3D data to be pasted is coarse.
On the other hand, while the VIDP method has an advantage such that it is no need to prepare the texture image for each viewpoint, in a case where a polygon mesh is coarse, the coarse polygon mesh is reflected in the image quality of the free viewpoint image.
Next, an example of an image produced by the image creation system according to the embodiment will be described. Hereinafter, examples of an image G1 to an image G10 will be described. Note that, in the description of each of the images G1 to G10 (
The “live-action FV” described below may be a clip including only the FV clip, or may be an output clip including the previous clip, the live-action FV, and the subsequent clip as illustrated in
In addition, the 3D image by CG does not require an actual imaging device in the first place, and the camera viewpoint is a virtual position. Therefore, it can be regarded as that all CG clips reproduced using the EPTS data are free viewpoint images. Therefore, whether it is a clip whose viewpoint position is changed in the middle of a moving image like the FV clip described above or a clip of a moving image whose viewpoint is fixed, the image generated in the 3D render 27 may be considered to be included in “CGFV”.
The image G1 in
For example, the 2D render 31 receives a live-action FV of a predetermined scene from the storage 16 and applies an image effect. In this case, the 2D render 31 determines the position and trajectory of the ball from the EPTS data, and performs processing of combining the image of the flame on the position and trajectory, thereby obtaining an image in which the shooting ball is on fire.
Note that the position information in the three-dimensional space based on the EPTS data is converted into the position information in the two-dimensional plane from the viewpoint corresponding to the camera path of the scene by the coordinate transformation unit 26, whereby the 2D render 31 can appropriately combine the effect images in accordance with the position and trajectory of the ball.
An image G2 in
The image G2 is an image from a low angle in a coat that cannot be captured by the free viewpoint image. The 2D render 31 can output the image G2 by receiving the live-action FV by the camera path including such a viewpoint position from the storage 16.
Similarly, the image as illustrated in
The 3D render 27 can generate, as the CGFV, an image from a low angle in the coat that cannot be normally captured and the like similarly to the live-action FV by the CG image of the player or the like and the EPTS data. Therefore, the 2D render 31 can read such a 3D image from the storage 28, perform 2D rendering, and output the rendered image as a 2D image.
An image G4 in
For example,
For example, the 2D render 31 can generate a 2D image as illustrated by reading the CGFV of the scene of each 3-point shoot from the storage 16, combining the CGFV, and applying image processing such as determining the trajectory of the ball from the EPTS data.
An image G5 in
For example, in a bird's eye view from above a basketball court, a trajectory of a ball during a predetermined period such as during a game or during focused play is represented as a graphic image.
For example, depending on the 2D render 31 or the 3D render 27, such an image can be generated by using the EPTS data.
An image G6 in
The image G6 in
The image G7 in
These images G6 and G7 can be generated, for example, by the 2D render 31 reading the necessary live-action FV from the storage 16 and performing drawing processing of the ball trajectory on the basis of the EPTS data.
An image G8 in
The image G8 displays a value indicating the height when the ball is at the highest position.
Furthermore, as an image example using the EPTS data and the live-action FV, there is also an image in which the height when a player jumps for shooting is superimposed with scales and numerical values.
Such an image G8 can be generated, for example, by the 2D render 31 reading necessary live-action FV from the storage 16 and performing drawing processing of characters, numerical values, and accompanying lines, scales, and the like on the basis of the EPTS data.
Furthermore, examples of the image using the EPTS data and the live-action FV or CGFV include an example of generating an image G9 (not illustrated) obtained by combining an image based on the EPTS data with the live-action FV generated by the FV render 12 or the CGFV generated by the 3D render 27.
For example, the locations of 5 players of the same team on the court are dynamically connected by a line, and a resulting zone is colored, and a color is changed and the like along a progress of a 24-second shot clock. Thus, the movement of each player and the ball of the team with the progress of time is expressed.
Such an image G9 can be generated, for example, by the 2D render 31 reading necessary clips from the storage 16 or the storage 28 and performing a process of drawing or coloring a line on the basis of the EPTS data.
The image G10-1 is an image showing each player as an image of a viewpoint from above the coat that cannot be normally arranged, and the image G10-2 is an image of a viewpoint from a low position in the coat that cannot be normally arranged. In this case, for example, in the FV clip as the live-action FV, some players are represented by CG and some players are represented by live-action. In addition, in the image G10-1 and the image G10-2, an additional image representing the trajectory of the ball is also combined.
Such an image G10 can be generated, for example, by the 2D render 31 reading necessary live-action FV and CGFV from the storage 16 or the storage 28 and performing combination on the basis of the EPTS data.
Note that an example has been described in which the image G1 to the image G10 described above are generated by, for example, the processing of the 2D render 31 as output as a 2D image, but the processing of the 2D render 31 described above can also be performed by the FV render 12 or the 3D render 27.
Furthermore, in a case where CGFV is used as the image G3, the image G4, the image G9, and the image G10, it is also conceivable to output them as 3D images without performing 2D rendering.
A processing example in a case of producing the image G10 from the image G1 will be described with reference to
In the following description, the storages 15, 16, 23, 24, 25, 28, and 32 illustrated in
Note that the processing examples in
In addition, each processing will be described as a processing example of a period after a start of the play of competition/event such as basketball, soccer, and the like.
In Step S101, the image processing unit 85 obtains estimation information such as the position, posture, and the like of the player and the ball. That is, the EPTS data is generated from the captured image of the imaging device 20 and stored in the storage 23.
In Step S102, the image processing unit 85 determines whether or not one play in the game is being continued. Here, one play may be determined according to the type of competition/event. For example, in basketball or soccer, it is only required to determine that the game is temporarily interrupted, such as a break due to the ball going out of a court, a break due to scoring, or a break due to fouling, or an end of a competition time (end of one set, interval, halftime), and to determine that one play ends.
In addition, a break timing of the game does not necessarily need to be the end of one play. For example, a period from immediately before shooting to after shooting may be determined as one play even if the game continues as it is.
The determination that the continuation of one play is interrupted may be automatically determined by the image processing unit 85 by image analysis and the like, or may be determined by the image processing unit 85 according to an operator input or the like.
In Step S103, the image processing unit 85 determines the end of the processing. For example, in a case where the end of the game is set as the end timing of the processing, the image processing unit 85 determines the end of the game automatically or according to an operator input and the like.
During one play, the image processing unit 85 repeats Steps S101, S102, and S103 described above. That is, while the EPTS data at each time point is acquired in Step S101, the break of one play, the end of the game, and the like are monitored.
At the timing when one play is interrupted, the image processing unit 85 determines in Step S104 whether or not to execute image content generation, that is, a process of generating clips as exemplified in the image G1, the image G6, the image G7, the image G8, and the image G9. For example, it is conceivable to ask the operator whether or not to generate image content every time one play ends, and detect and determine an answer input. Alternatively, the image processing unit 85 may analyze the content of the captured image of one play so far, and may automatically determine that content generation is performed in a case where the content is determined as a specific scene.
When the image content generation is not executed, the image processing unit 85 returns to Step S101.
In a case where it is determined that the image content generation is executed, the image processing unit 85 proceeds to Step S110 and acquires the EPTS data for one play of this time. For example, the image processing unit 85 reads EPTS data of one play period from the storage 23.
In Step S111, the image processing unit 85 performs processing of determining a player to be featuring in the generated image content. For example, the image processing unit 85 determines a player to be expressed as an FV clip from a free viewpoint or a player to which an image effect is given. That is, this is a process of determining a player to be a main subject in the generated image.
This process may be performed in accordance with an operator input or may be automatically performed. In a case where the operator has designated a specific player in advance or in a case where the operator has performed an operation to designate a specific player for one play this time, the image processing unit 85 determines the designated player as the main subject. Furthermore, the image processing unit 85 may automatically determine a player who has performed a specific play, for example, such as a player who has taken a shot according to the image content of one play, and may determine the player as the main subject. Furthermore, the image processing unit 85 may present each player imaged in one play this time or a plurality of players having main movements, and select the player by the operator.
In Step S312, the image processing unit 85 selects a performance pattern. This is, for example, a process of selecting one of the types of performance of the image G1, the image G6, the image G7, the image G8, and the image G9.
For example, the image processing unit 85 selects whether to give an effect such as the image G1, to combine images presenting a flow line of a subject such as the image G6 and the image G7, to generate an image presenting a numerical value and the like based on EPTS data such as the image G8, or to generate an image combining an image with a clip such as the image G9.
This process may also be performed in accordance with operator input or automatically. In a case where the operator has designated a specific performance in advance or in a case where the operator has performed an operation of designating a specific performance for one play of this time, the image processing unit 85 determines the performance pattern by the designation. In addition, the image processing unit 85 can automatically select a performance pattern from the image content of one play according to, for example, a shoot scene, a pass scene, a foul scene, a type of a shoot, and the like.
In Step S113, the image processing unit 85 selects a necessary image on the basis of the EPTS data acquired in Step S110 and sets the selected image as a processing target. The image processing unit 85 selects an image of an angle not hidden in at least in Step S111. Since the position of the player can be confirmed by the EPTS data, the image processing unit 85 can determine the image of the imaging device 10 capturing the player.
Therefore, the image processing unit 85 can select and read the FV clip or the 3D image capturing the player from the storages 16 and 28 to be processed, or can select an image of a necessary angle from the recording unit 11 for a necessary period at this point of time to generate the FV clip.
In Step S114, the image processing unit 85 performs processing of the performance image to be added to the selected image. For example, the image processing unit 85 generates a CG image for a performance effect on the basis of EPTS data.
For example, in a case where performance patterns such as the image G1, the image G6, the image G7, and the image G9 are selected in Step S112, the image processing unit 85 smooths the EPTS data in Step S114, and generates a CG image on the basis of the smoothed EPTS data.
The smoothing of the EPTS data is, for example, a process of smoothly changing a change in position information of a player or a ball at each time point. Position information of a player or a ball based on actual EPTS data has many fine variations, which makes it difficult to draw a smooth trajectory as it is. Therefore, a smooth trajectory is grasped by smoothing.
Then, in a case of the performance pattern like the image G1, the image processing unit 85 determines the trajectory of the ball by the smoothed EPTS data, and generates a CG image as an effect to be superimposed on the trajectory.
In a case of the performance patterns such as the image G6 and the image G7, the image processing unit 85 determines the trajectory of the ball based on the smoothed EPTS data, and generates a CG image expressing the trajectory.
In a case of the performance pattern like the image G9, the image processing unit 85 determines the trajectory of each player by the smoothed EPTS data, and generates a CG image expressing a color of a line or an area connecting the players according to the position of each player in each frame.
Furthermore, for example, in a case where the performance pattern such as the image G8 is selected in Step S112, the image processing unit 85 performs a process of converting a value obtained by the EPTS data into display data in Step S114. That is, the image processing unit 85 determines a numerical value and the like to be expressed by the EPTS data, and generates a CG image as the numerical value or an additional image.
In Step S115, the image processing unit 85 performs processing of superimposing the CG image of the performance effect generated in Step S114 on the image selected (or generated) in Step S113.
Therefore, image content having contents as exemplified in the image G1, the image G6, the image G7, the image G8, and the image G9 is generated.
In Step S116, the image processing unit 85 outputs the generated image content generation. Then, the image processing unit 85 returns to Step S101.
Through the above processing, image content including scenes such as the image G1, the image G6, the image G7, the image G8, and the image G9 can be output.
Next, a processing example for generating image content including content such as the image G3 will be described with reference to
Note that, hereinafter, the same process as a description process is denoted by the same step number, and detailed duplicated description is avoided.
In the processing example of
In Step S101, the image processing unit 85 generates EPTS data from the captured image of the imaging device 20 and stores the EPTS data in the storage 23.
In Step S121, the image processing unit 85 acquires the EPTS data and an avatar for the current frame. The avatar is a 3D model of a player, and refers to CG player models of the storage 24 in the case of this processing example.
In Step S123, the image processing unit 85 performs avatar processing based on the EPTS data. That is, the image processing unit 85 determines the position and posture of the player expressed in the current frame by the EPTS data, and generates the 3D image of each player from the CG player models.
In Step S124, the image processing unit 85 generates an image of a virtual viewpoint. That is, the image processing unit 85 generates an image from an arbitrary viewpoint using a 3D image by CG of each player. For example, the image processing unit 85 can also generate an image at an angle that cannot be normally captured, such as the image G3.
In Step S125, the image processing unit 85 outputs the generated image content generation of the virtual viewpoint. Then, if it is not determined in Step S103 that the process is ended, the image processing unit 85 returns to Step S101.
Therefore, the image processing unit 85 can generate and output image content that reproduces a game from various virtual viewpoints.
Next, a processing example for image content generation using EPTS data such as the image G10, the live-action FV, and the CGFV will be described with reference to
After the start of the game, the image processing unit 85 generates the EPTS data in Step S101, determines whether or not one play is continued in Step S102, and performs the processing from Step S121 to Step S125 until it is determined in Step S103 that the game is ended.
That is, during one play, the image processing unit 85 generates and outputs reproduction images of the game from various viewpoints in Steps S121 to S125 similarly to
However, the avatar acquired by the image processing unit 85 in Step S121 is the CG player model stored in the storage 24 in
At a timing when one play is interrupted, the image processing unit 85 proceeds from Step S102 to Step S131, and the image processing unit 85 determines whether or not a player is present in a designated area in a period of this one play.
The designated area mentioned here is an area in which captured images of the player can be obtained from various angles by as many imaging devices 10 as possible for convenience of camera arrangement. For example, as illustrated in
The designated area DA is set in advance in this manner, and the image processing unit 85 determines whether or not the player is present in the designated area DA in Step S131. In this case, whether or not a specific player is present in the designated area DA may be determined, or whether or not a player is present in the designated area DA may be determined without any limitation on the player.
If there is no player (or specific player) in the designated area DA in the image in the current period of one play, the image processing unit 85 returns to Step S101. That is, in that case, the generation of the live-action player model is not performed.
In a case where it is determined in Step S131 that there is a player in the designated area DA, the image processing unit 85 proceeds to Step S132 and acquires EPTS data for one play of this time. For example, the image processing unit 85 reads EPTS data of one play period from the storage 23.
In Step S133, the image processing unit 85 determines whether or not there is a condition matching timing for generating a live-action player model (avatar by live-action) for the player who has been in the designated area DA during the current one play.
For example, the following conditions are determined.
The occlusion is a state in which an object in front hides an object behind. In this case, a state in which another player is present around a target player in the designated area DA and is hidden from the viewpoint of the captured image is referred to as occlusion.
As the captured image of each viewpoint by the plurality of imaging devices 10 at a certain timing during one play, that there are a predetermined number or more of images captured in a state where the target player is not occluded is one of the conditions for generating a high-accuracy live-action player model of the player.
The condition that the posture is appropriate is a condition that a posture suitable for 3D modeling, for example, such as a so-called T pose or A pose, or a posture close thereto is desirable. The fact that the target player in the designated area DA is in a pose suitable for 3D modeling may be one of the conditions for generating a high-accuracy live-action player model.
This is because there is a high possibility that motion blur occurs in the captured image at the time of intense movement, and the captured image is not suitable for generating a high-accuracy live-action player model. Therefore, one of the conditions may be that a captured image in a state of not moving violently is obtained.
For example, one or more conditions as described above are set, and the image processing unit 85 determines whether or not there is a timing at which the conditions are satisfied in one play of this time.
If it is determined that there is no timing to satisfy the condition, that is, there is no timing to obtain an appropriate captured image for 3D modeling in the image in the current period of one play, the image processing unit 85 returns to Step S101. That is, in that case, the generation of the live-action player model is not performed.
In a case where it is determined that there is a condition matching timing, the image processing unit 85 proceeds to Step S134 and generates an avatar for the player. That is, the image processing unit 85 performs 3D modeling using the captured images of the plurality of viewpoints at the timing when the conditions are met to generate a live-action player model.
In Step S135, the image processing unit 85 branches the processing by quality determination. The image processing unit 85 automatically determines whether or not the generated live-action player model has a sufficient quality or whether or not the generated live-action player model is displayed on the display unit 77 to determine a sufficient quality according to the operator's operation.
In a case where the quality is determined to be insufficient, the image processing unit 85 returns to Step S101. In this case, the image processing unit 85 discards the live-action player model created this time.
When the quality is sufficient, the image processing unit 85 proceeds to Step S136 and updates the avatar. For example, the live-action player model is stored in the storage 16 so that the live-action player model can be used in the processing as the 3D render 27. Then, the image processing unit 85 proceeds to Step S121.
For the player whose avatar as the live-action player model is generated as described above, an image using the live-action player model is generated instead of the CG player model in the subsequent processing of Steps S121 to S125.
For this reason, for example, the image content to be produced gradually changes each player from the CG image to the live-action image as the game progresses.
By the way, as a modification of
Furthermore, in addition to sequentially generating images using the live-action player model instead of the CG player model during the game as illustrated in
Moreover, until the live-action player model is generated for all the players, the CG player model is used for all the players, and the live-action player model is generated for all the players, so that the images using the live-action player model may be generated all at once.
Furthermore, as a modification of
Next, a processing example for generating image content including content such as the image G4 will be described with reference to
After the start of the game, the image processing unit 85 performs the processing of Steps S101 and S140 until it is determined in Step S103 that the game is ended.
In Step S101, the image processing unit 85 generates EPTS data from the captured image of the imaging device 20 and stores the EPTS data in the storage 23.
In Step S140, the image processing unit 85 determines whether or not it is the timing of the break of the game. Here, for example, an interval between basketball quarters, halftime, a timing of the end of a game, or the like is determined as the timing of the break of the game.
In a case where it is determined that the timing of the break of the game has come by analysis of the captured image, input by the operator, or the like, the image processing unit 85 proceeds from Step S140 to Step S141, and determines whether or not to execute image content generation, that is, a process of generating a clip as exemplified in the image G4. This determination can be performed by operator input or automatic processing as in Step S104 in
When the image content generation is not executed, the image processing unit 85 returns to Step S101.
In a case where it is determined to execute image content generation, the image processing unit 85 proceeds to Step S142 and acquires EPTS data of a necessary period. For example, at the timing of the interval at an end of a first quarter, the image processing unit 85 reads the EPTS data in a period of the first quarter from the storage 23.
In Step S143, the image processing unit 85 generates a CG image on the basis of the EPTS data of the target scene. For example, when an image in which shooting scenes within a period are collected is generated, the positions of each player and balls are determined from the EPTS data of the shooting scenes, and a combined image by CG is generated.
In Step S144, the image processing unit 85 outputs the generated image content generation. Then, if it is not a timing such as the end of the game and the like through Step S103, the image processing unit 85 returns to Step S101.
Next, a processing example for generating image content including content such as the image G5 will be described with reference to
After the start of the game, the image processing unit 85 performs the processing of Steps S101 and S140 until it is determined in Step S103 that the game is ended.
In Step S101, the image processing unit 85 generates EPTS data from the captured image of the imaging device 20 and stores the EPTS data in the storage 23.
In Step S140, the image processing unit 85 determines whether or not it is the timing of the break of the game.
In a case where it is determined that the timing of the break of the game has come, the image processing unit 85 proceeds from Step S140 to Step S141, and determines whether or not to execute image content generation, that is, processing of generating a clip as exemplified in the image G5.
When the image content generation is not executed, the image processing unit 85 returns to Step S101.
In a case where it is determined to execute image content generation, the image processing unit 85 proceeds to Step S142 and acquires EPTS data of a necessary period.
In Step S151, the image processing unit 85 performs smoothing of the EPTS data in a period acquired in Step S142. This is because a smooth trajectory is expressed by smoothing as described as one of the processes in Step S114 in
The EPTS data smoothed in Step S152 is converted into display data to obtain an image within the period. Therefore, an image by CG that expresses a movement trajectory of a ball, a player, and the like by the EPTS data in the period acquired in Step S142 is generated. The moving image may be a moving image in which the movement trajectory is gradually drawn with the movement trajectory at each time point in the period as each frame, or may be a still image showing the trajectory by combining images of movement positions in the period.
In Step S144, the image processing unit 85 outputs the generated image content generation. Then, if it is not a timing such as the end of the game and the like through Step S103, the image processing unit 85 returns to Step S101.
In addition, although
Next, processing for improving image quality in a case where an output image is generated by switching or combining a free viewpoint image (live-action FV) based on live-action by the FV render 12 and a 3D image (CGFV) using CG by the 3D render 27 in the system of the embodiment will be described.
A sequence as illustrated in
The camera path designation information CPD is supplied to the FV render 12 and the 3D render 27 at the time of generating the free viewpoint image according to the operation of the operator.
The FV render 12 can generate a free viewpoint image by the specified camera path, that is, a live-action FV according to the camera path designation information CPD.
In addition, the 3D render 27 can generate a free viewpoint image by the designated camera path, that is, CGFV according to the camera path designation information CPD.
The 2D render 31 can acquire the live-action FV generated by the FV render 12, the depth information DPT, and the quality determination information CI. The depth information DPT is depth information of each subject (player or ball) viewed from the viewpoint position of each frame of the live-action FV. The quality determination information CI is information used for quality determination, and includes time information of each frame and information of an area where a player of a subject is present.
In addition, the 2D render 31 can acquire the CGFV generated by the 3D render 27 and the depth information DPT.
The 2D render 31 selects or combines the live-action FV and the CGFV to generate an output image. The depth information DPT and the quality determination information CI are used at the time of selection and combination.
The generated output image is distributed or broadcast, for example, according to an operator's operation.
In this sequence example, the 2D render 31 selects and combines two free viewpoint images of the live-action FV and the CGFV to generate an output image. As the image content as the output image, one obtained by selecting or combining the live-action FV and the CGFV is used.
Specifically, there is an example in which one of the live-action FV and the CGFV is selected as the output image. There is also an example in which the live-action FV and the CGFV are selected for each frame of the output image in the time axis direction. Furthermore, there is also an example in which a subject is combined in an image of one frame of the output image.
Note that the process of generating the output image by “selecting” the live-action FV and the CGFV includes a process example of generating both the live-action FV and the CGFV and then selecting one of the live-action FV and the CGFV as the output image. Alternatively, there is a processing example in which the live-action FV and the CGFV are selected first on the basis of the camera path and the like, and in a case where the live-action FV is selected, the live-action FV is generated by the FV render 12, and the 2D render 31 generates an output image using the live-action FV, and in a case where the CGFV is selected, the CGFV is generated by the 3D render 27, and the 2D render 31 generates the output image using the CGFV. Any of the above may be adopted.
First, an example of selection of the live-action FV and the CGFV will be described.
In a case of generating the image content to be output, there is an example in which the quality of either the live-action FV or the CGFV is simply determined and selected. For example, quality determination is performed using the quality determination information CI, and in a case where the live-action FV can be generated with high quality, the live-action FV is set as the output image as it is, and in a case where the live-action FV cannot be generated with high quality, the CGFV is set as the output image.
In addition to selecting one of the live-action FV and the CGFV as the entire FV clip in this manner, selection may be performed such that the live-action FV is used in a certain period of the FV clip and the CGFV is used in a certain period.
For example, in a case where the quality of each frame in a period from the frame FR #x to the frame FR #(y-1) (not illustrated) is degraded in the live-action FV, CGFV is selected for each frame in this period, so that the output image is as illustrated in
As described above, the quality determination for selecting the live-action FV and CGFV for the entire FV clip or for each frame is performed as follows, for example.
First, there is an element for determining a relationship between the camera path and the arrangement of the plurality of imaging devices 10 as a determination element.
From the field of view of each imaging device 10 according to a preset camera arrangement, an area where the fields of view of a predetermined number or more of imaging devices 10 overlap, for example, three or more imaging devices can be specified in a basket coat.
The accuracy of the 3D model for generating a free viewpoint image increases as captured images from a larger number of viewpoints are obtained. Therefore, an area that can be imaged by a predetermined number or more of imaging devices 10, such as at least three or more imaging devices, is specified in advance. For example, a designated area is the designated area DA in
Then, determining whether or not the subject in the designated area DA is targeted in the current camera path is one of the quality determinations.
That is, if a subject that is a target of the free viewpoint image in the current camera path is in the designated area DA, the quality is good as a determination factor. Conversely, in a case where the target subject is not in the designated area DA, the quality of the live-action FV may be degraded.
In addition, as a relationship between the camera path and the field of view of each of the plurality of imaging devices, it may be determined whether or not the camera path passes through an area exceeding the zoom limit.
Here, the zoom limit is set as an upper limit of a zoom magnification for maintaining the quality required by the live-action FV. For example, in a case where the imaging device 10 uses both an optical zoom and a digital zoom, a maximum magnification of the optical zoom may be set as the zoom limit. Furthermore, not limited to the zoom system, a certain predetermined zoom magnification may be set as the zoom limit.
Whether or not the camera path is within the zoom limit area is a determination factor of the quality of the live-action FV. That is, if the camera path does not fall within the zoom limit area, the quality of the live-action FV is good as the position determination element, and if the camera path falls within the zoom limit area, the quality of the live-action FV may be deteriorated in the live-action FV or in the section of the camera path.
With determination of the arrangement of the imaging devices 10, the designated area DA and the zoom boundary ZB are specified as illustrated in each drawing. The zoom boundary ZB is a line indicating the zoom limit, and indicates that the subject inside an elliptical shape of the zoom boundary ZB has a magnification exceeding the zoom limit. Furthermore, a target player 90 in the free viewpoint image is indicated by a black circle, and other players 91 are indicated by white circles.
In
Therefore, from only these viewpoints alone, it is determined that the quality of the current live-action FV captured by the camera path CP1 can be maintained. Therefore, it is conceivable to select the live-action FV in all the frames in the camera path CP1. Although the camera path CP1 is indicated by a double line, the double line is a section for selecting the live-action FV.
On the other hand,
In this case, the quality of the live-action FV by the camera path CP2 may be degraded in the image of the thick line section. Therefore, it is conceivable that CGFV is selected for the frame in the thick line section of the camera path CP2, and the live-action FV is selected for the frame in the section indicated by the double line that does not exceed the zoom boundary.
Note that, in this case, there is also a concept that the entire output image is based on CGFV in consideration of the quality of a part of the live-action FV.
In these cases, since it is conceivable that the quality of the free viewpoint image of the target player 90 cannot be maintained, the frames of the entire sections of the camera paths CP3 and CP4 are based on CGFV.
The above is an example of quality determination by the designated area DA and the zoom boundary ZB, but, moreover, there is an example of determining an arrangement relationship between a subject (target player 90) for which a free viewpoint image is to be generated and other players 91 in the image. Specifically, the degree of congestion around the target player 90 is determined.
It is determined whether or not a distance between the players is within a certain distance.
Furthermore, it is determined whether or not occlusion by the other players 91 with respect to the target player 90 is a predetermined number or more as viewed from the imaging devices 10 configuring the viewpoint of the camera path.
In a case where the distance between the players is within a certain range, or in a case where the occlusion is equal to or more than a predetermined number, the accuracy of the 3D model for the target player 90 decreases. Therefore, in such a case, the quality of the live-action FV may not be maintained, and CGFV is selected.
The above is an example in which the live-action FV and the CGFV are selected in a time axis direction. Next, an example in which subjects of the live-action FV and the CGFV are combined in an image of one frame will be described.
In addition, player images 92C, 93C, 94C, and 95C obtained by individually rendering the players 92, 93, 94, and 95 on the basis of the EPTS data are illustrated as CGFVs, and player images 92R, 93R, 94R, and 95R obtained by individually rendering the players 92, 93, 94, and 95 are illustrated as live-action FVs.
Here, as the quality determination, the designated area DA is indicated in the frame. This is an area where the high-quality live-action FV can be generated, which is calculated from the camera arrangement and the selection arrangement in advance as described above. In the drawing, the designated area DA is three-dimensionally indicated by a broken line.
Then, it is assumed that the player 92 is inside the designated area DA, and the other players 93, 94, and 95 are outside the designated area DA.
In this case, as an output image of the frame, as illustrated in the drawing, the player image 92R by live-action FV and the player images 93C, 94C, and 95C by CGFV are combined and generated.
In this way, as the player image in the frame, combination in which CGFV is used can be performed on the basis of the quality determination while using the live-action FV as much as possible.
Hereinafter, a specific processing example will be described. Each of the following processing examples is processing of the image processing unit 85 of the information processing device 70 as the 2D render 31 in the sequence of
In Step S301, the image processing unit 85 sets the camera path of the FV clip to be generated this time according to the camera path designation information CPD by the operator's input.
In Step S302, the image processing unit 85 performs quality determination processing of the live-action FV. For example, as described above, various types of quality determination elements are determined, such as whether or not the target player 90 of the free viewpoint image by the current camera path is in the designated area DA, whether or not the camera path exceeds the zoom boundary ZB, whether or not the distance between players is within a certain range as player arrangement, and whether or not the occlusion is equal to or greater than a predetermined number.
Then, in a case where it is determined that the live-action FV satisfies the quality condition, the image processing unit 85 proceeds from Step S303 to Step S304, generates a clip as the live-action FV, and sets the clip as the output image in Step S306.
On the other hand, in a case where it is determined that the live-action FV does not satisfy the quality condition, the image processing unit 85 proceeds from Step S303 to Step S305, generates an FV clip by CGFV, and sets the FV clip as the output image in Step S306.
Through the above processing, the FV clip by the live-action FV or the FV clip by the CGFV is selected according to the quality determination, and is set as the output image.
In Step S310, the image processing unit 85 sets the camera path of the FV clip to be generated this time according to the camera path designation information CPD by the operator's input.
In Step S311, the image processing unit 85 performs quality determination processing of the live-action FV for each frame configuring the camera path. That is, determination elements such as the designated area DA, the zoom boundary ZB, the distance between players, the occlusion and the like are determined for each frame.
In Step S312, the image processing unit 85 sets a variable N to “1”, and sets a variable Nmax as the total number of frames of the FV clip to be generated.
In Step S313, the image processing unit 85 branches the processing with reference to the quality determination result of the Nth frame. In a case where it is determined that the live-action FV in the Nth frame satisfies the quality condition, the image processing unit 85 proceeds to Step S314 and generates or selects an image by the live-action FV as the Nth frame of the output image.
In a case where it is determined that the live-action FV in the Nth frame does not satisfy the quality condition, the image processing unit 85 proceeds to Step S315, and generates or selects an image by CGFV as the Nth frame of the output image as the Nth frame of the output image.
In Step S316, the image processing unit 85 confirms whether or not the variable N has reached the variable Nmax. If the variable N has not reached the variable Nmax, the variable N is incremented in Step S317, and the image processing unit 85 returns to Step S313.
Therefore, the live-action FV and the CGFV are selected for each frame according to the quality determination result.
When it is determined in Step S316 that the variable N has reached the variable Nmax, the image processing unit 85 proceeds to Step S318, generates a clip by the live-action FV and CGFV selected for each frame, and outputs the clip in Step S306.
Through the above processing, the live-action FV or CGFV is selected according to the quality determination for each frame, and a clip is generated and used as an output image.
In Step S320, the image processing unit 85 sets one frame to be processed. Since the process of
In Step S321, the image processing unit 85 acquires the EPTS data of the current frame to be processed.
In Step S322, the image processing unit 85 extracts the subject in the frame. For example, the image processing unit 85 extracts an image of a player or a ball.
In Step S323, the image processing unit 85 sets the variable M to “1”, and sets the variable Mmax as the total number of frames of the FV clip to be generated.
In Step S324, the image processing unit 85 determines whether or not the live-action FV satisfies the quality condition for an Mth subject. For example, as described with reference to
In a case where it is determined that the live-action FV of the Mth subject satisfies the quality condition, the image processing unit 85 proceeds to Step S325 and generates or selects the image of the Mth subject with the live-action FV.
In a case where it is determined that the live-action FV of the Mth subject does not satisfy the quality condition, the image processing unit 85 proceeds to Step S326 and generates or selects the image of the Mth subject by CGFV.
In Step S327, the image processing unit 85 confirms whether or not the variable M has reached the variable Mmax. If the variable M has not reached the variable Mmax, the image processing unit 85 increments the variable M in Step S328, and returns to Step S324.
Therefore, the live-action FV and the CGFV are selected for each subject according to the quality determination result.
When it is determined in Step S327 that the variable M has reached the variable Mmax, the image processing unit 85 proceeds to Step S329 and combines the selected live-action FV or CGFV for each subject such as a player, and the like with the background image to generate an image of one frame.
The above processing of
As described above,
These processes can also be combined.
For example, in Step S304 of
In addition, the quality determination processing for the live-action FV has been described, but the quality determination processing for the CGFV may be performed.
For example, the EPTS data may be insufficient or inaccurate for some reason. Therefore, it is also conceivable to determine whether or not the EPTS data is sufficient as the quality determination processing of the CGFV, and to select the live-action FV and the CGFV for the entire FV clip, for each frame, or for each subject on the basis of the determination.
According to the above-described embodiments, the following effects can be obtained.
The image creation system 300 of the embodiment includes the EPTS data generation unit 22 (estimation information generation unit 4) that generates the EPTS data (estimation information) about the subject on the basis of at least one of the captured image and the sensor information. Furthermore, the image creation system 300 includes the FV render 12 (free viewpoint image generation unit 3) that generates the live-action player model (first three-dimensional model) that is a three-dimensional model of the subject on the basis of a plurality of pieces of captured image data obtained by simultaneously capturing images from a plurality of viewpoints, and generates a free viewpoint image that is an image of an arbitrary viewpoint for the subject using the live-action player model. Furthermore, the image creation system 300 includes the 3D render 27 (three-dimensional image generation unit 5) that can generate a three-dimensional image on the basis of the EPTS data and the CG player model (second three-dimensional model) that is a virtual three-dimensional model of the subject.
The image creation system 300 can generate various image contents by switching and combining the free-viewpoint image based on the live-action and the three-dimensional image using the virtual three-dimensional model by using the EPTS data. For example, the FV render 12 can generate an image of a free viewpoint without position constraint from the multi-viewpoint video by the plurality of imaging devices 10 as the free viewpoint image. Furthermore, depending on the 3D render 27, an image including a viewpoint that cannot be captured in reality, a subject that does not exist in reality, and the like can be generated. New and various image contents can be created by combining or selecting the images by the FV render 12 and the 3D render 27 on the basis of the EPTS data.
In the embodiment, the 3D render 27 can generate a 3D image using the live-action player model and the CG player model.
That is, the 3D render 27 can receive the live-action player model and the CG player model, and can generate a 3D image using these models. Therefore, as in the processing example of
In the embodiment, an example has been described in which the 3D render 27 generates a 3D image by selectively using the live-action player model and the CG player model for a specific subject.
For example, as described as the modification in
In the embodiment, an example has been described in which the 2D image generation unit 31 (two-dimensional image generation unit) that generates a 2D image by selectively using a live-action image including a free viewpoint image generated by the FV render 12 and a 3D image generated by the 3D render 27 is included.
The 2D render 31 receives the free viewpoint image or the 3D image, and can generate the 2D image by selectively using the free viewpoint image or the 3D image. As a result, the 3D image and the free viewpoint image obtained by the image creation system 300 in which the volumetric system 100 and the EPTS 200 are combined can be displayed in a 2D manner. That is, image display using a general display device can be realized.
In the embodiment, the EPTS data including the position information of the person or the object as the subject in the captured image or the person or the object that can be sensed by the sensor 29 as the estimation information is exemplified.
In the 3D render 27, with use of the position information in the EPTS data, an image reflecting the position of the actual subject as a non-live-action image using CG can be generated.
In the embodiment, the estimation information is EPTS data including posture information of a person or an object as a subject in the captured image, or a person or an object that can be sensed by the sensor 29. In the 3D render 27, with use of the posture information in the EPTS data, an image reflecting the posture of the player and the like of the actual subject can be generated as a non-live-action image using the CG.
Note that the estimation information is not limited to the EPTS data. For example, the estimation information may be information that can be detected from an image by image analysis.
In the embodiment, as in the image G1 in
For example, the 2D render 31 can receive a live-action image including the free viewpoint image generated by the FV render 12 from the storage 16 and generate a 2D image to which an image effect is added on the basis of the effect designation information EF (see
Therefore, for example, image content can be created in which an image by an effect that cannot be actually viewed and a live-action image (particularly, a free viewpoint image) are combined.
In the embodiment, an example of generating an image viewed from a viewpoint position where the imaging device 10 is not arranged using the EPTS data has been described.
For example, in the 3D render 27, an image from an angle that cannot be captured in reality, such as the image G3 in
In the embodiment, an example of generating an image in which 3D images at a plurality of time points are combined on the basis of the EPTS data as in the image G4 of
For example, the 3D render 27 can generate an image obtained by combining images such as trajectories of players and balls at a plurality of time points on the basis of, for example, EPTS data in a period during a game or the like (see
In the exemplary embodiment, as in the image G5 in
For example, the 3D render 27 may generate an image representing the trajectory of movement of a player or ball on the basis of EPTS data for a period of time, e.g., a period of time during a game (see
In the embodiment, as in the image G6 in
For example, the 2D render 31 can combine a live-action image including the free-viewpoint image generated by the FV render 12 and an image showing the flow line of the subject generated by the 3D render 27 on the basis of the EPTS data (see
In the embodiment, as in the image G8 in
For example, the 2D render 31 can combine a live-action image including the free viewpoint image generated by the FV render 12 and an image obtained by converting numerical values based on the EPTS data into display data by the 3D render 27, for example, an image showing letters, numbers, or scales (see
In the embodiment, as in the image G9 (not illustrated), an example has been described in which an image obtained by combining an image based on the EPTS data with a live-action image including a free viewpoint image generated by the FV render 12 or a 3D image generated by the 3D render 27.
For example, the 2D render 31 may combine additional images further generated by the 3D render 27 on the basis of the EPTS data to a live-action image including a free viewpoint image generated by the FV render 12 or a 3D image generated by the 3D render 27 (see
In the embodiment, an example has been described in which an image is generated by combining a live-action image including a free viewpoint image generated by the FV render 12 and a 3D image generated by the 3D render 27 on the basis of the EPTS data, such as the image G10, for example, the image G10-1 and the image G10-2 in
For example, the 2D render 31 can combine a live-action image including a free viewpoint image generated by the FV render 12 with a 3D image generated by the 3D render 27 on the basis of the EPTS data (see
Note that, for example, in addition to the live-action image including the free viewpoint image generated by the FV render 12 and the 3D image generated by the 3D render 27 on the basis of the EPTS data, an additional image (image of character, numerical, travel line, trajectory, and so on) further generated by the 3D render 27 on the basis of the EPTS data may be combined.
The image creation system 301 of the embodiment includes the EPTS data generation unit 22 (estimation information generation unit 4) that generates the EPTS data (estimation information) regarding the subject on the basis of at least one of the captured image and the sensor information. Furthermore, the image creation system 301 includes the FV render 12 (free viewpoint image generation unit 3) that generates a three-dimensional model of the subject on the basis of a plurality of pieces of captured image data obtained by simultaneously capturing images from a plurality of viewpoints, and generates a free viewpoint image that is an image of an arbitrary viewpoint for the subject using the three-dimensional model. The image creation system 301 also includes a 3D render 27 (three-dimensional image generation unit 5) capable of generating a three-dimensional image on the basis of the EPTS data and a three-dimensional model of the subject. The image creation system 301 further includes the 2D render 31 (output image generator 6) that generates an output image on the basis of the free viewpoint image generated by the FV render 12 and the three-dimensional image generated by the 3D render 27. The FV render 12 and the 3D render 27 can also function as the output image generation unit 6.
In the image creation system 301, in a case where the output image is generated on the basis of the free viewpoint image based on the live-action image and the three-dimensional image using the virtual three-dimensional model, for example, the output image can be generated by switching or combining the free viewpoint image and the three-dimensional image. For example, as shown as the sequence of
In the embodiment, an example has been described in which the output image generation unit 6 generates the output image by selectively using the live-action image including the free viewpoint image generated by the FV render 12 and the 3D image generated by the 3D render 27 (see
For example, the 2D render 31 functions as the output image generation unit 6 in
Such an output image generation unit 6 generates the output image by selectively using the live-action image (for example, the live-action FV) including the free viewpoint image generated by the FV render 12 and the 3D image (for example, the CGFV) generated by the 3D render 27, so that the image content in which the live-action image and the CG are combined can be output.
The example in which the output image generation unit 6 of the embodiment generates the output image by selectively using the live-action image including the free viewpoint image generated by the FV render 12 and the 3D image generated by the 3D render 27 for each period has been described (see
For example, the output image generation unit 6 selects the live-action FV and the CGFV for each frame period and generates an FV clip as an output image. This makes it possible to output image content as a live-action image during a certain period and as a CG during a certain period in a moving image.
The example in which the output image generation unit 6 of the embodiment generates the output image by combining the live-action image including the free viewpoint image generated by the FV render 12 and the 3D image generated by the 3D render 27 has been described (see
This makes it possible to output image content in which live-action (for example, live-action FV) and CG (for example, CGFV) are mixed in a frame in a moving image.
The example has been described in which the output image generation unit 6 of the embodiment generates the output image by combining the subject image by the live-action image including the free viewpoint image generated by the FV render 12 and the subject image by the 3D image generated by the 3D render 27 (see
For example, the output image generation unit 6 selects whether to use the image generated by the FV render 12 or the image generated by the 3D render 27 for each subject such as a player, and so on, and combines the images in one frame. This makes it possible to output the image content in which the live-action image and the CG are combined for each frame.
In the embodiment, an example has been described in which the output image generation unit 6 generates the output image selectively using the live-action image including the free viewpoint image generated by the FV render 12 and the 3D image generated by the 3D render 27 on the basis of the camera path of the free viewpoint image (see
Depending on the camera path, it can be determined which of the live-action image generated by the FV render 12 and the 3D image generated by the 3D render 27 is appropriate. Consequently, the image quality of the image content selectively using the live-action image and the CG can be improved.
In the embodiment, an example has been described in which the output image generation unit 6 performs the quality determination processing of the free viewpoint image, and generates the output image by selectively using the live-action image including the free viewpoint image generated by the FV render 12 and the three-dimensional image generated by the 3D render 27 according to the quality determination result (see
For example, through the quality determination processing, for example, it can be determined which of the live-action FV generated by the FV render 12 and the CGFV generated by the 3D render 27 is appropriate. Consequently, the image quality of the image content selectively using the live-action image and the CG can be improved.
In the quality determination processing of the embodiment, an example has been described in which the quality of the free viewpoint image is determined on the basis of the relationship of the arrangement of each of the plurality of imaging devices 10 (see
Since the quality of the free viewpoint image is improved by being imaged by a larger number of imaging devices 10, determining which of the live-action free viewpoint image and the CG image is suitable from the arrangement relationship of each of the imaging devices 10 is suitable for improving the quality of the output image.
In the quality determination processing of the embodiment, an example has been described in which it is determined whether or not the subject to be the target of the free viewpoint image exists in the designated area DA in the visual field of the predetermined number or more of imaging devices 10 on the basis of the relationship of the arrangement of each of the plurality of imaging devices 10 (see
A designated area DA imaged by a larger number of imaging devices 10 is set in advance to determine whether or not the target player 90 is in the designated area DA in the free viewpoint image. This determines whether or not the live-action FV is qualitatively suitable.
In the quality determination processing of the embodiment, an example of determining a section in which the zoom magnification of the imaging device 10 is equal to or greater than a predetermined value in the camera path has been described (see
Even in the field of view of the imaging device 10, the image quality may deteriorate in a case where the zoom magnification is equal to or greater than the predetermined value. Therefore, a section in which the zoom magnification is equal to or greater than the predetermined value is determined in the camera path. This determines a frame section in which the live-action free viewpoint image is not qualitatively suitable.
In the quality determination processing of the embodiment, an example has been described in which the arrangement relationship between an object for which the free viewpoint image is generated and other objects in the image at the viewpoint defined by the camera path is determined.
For example, in a case where a free viewpoint image of a certain player is generated, the quality of the free viewpoint image changes depending on the arrangement state of the player in the image. Therefore, in each viewpoint by the camera path of this time. It is preferable to determine which of the live-action free viewpoint image and the CG image is suitable from the arrangement relationship between the target player and another player.
In the quality determination processing of the embodiment, an example of determining the degree of congestion of the subject around the subject for which the free viewpoint image is to be generated has been described.
In a case where the surroundings of the target player for which the free viewpoint image is to be generated are congested, the 3D model generation accuracy of the target player may decrease, and the quality of the free viewpoint image may also decrease. Therefore, determining the degree of congestion is appropriate for determining which one of the live-action free viewpoint image and the CG image is suitable.
Specifically, the degree of congestion is only required to be determined by determining whether or not the distance between the players is equal to or more than a certain value, or by determining the amount of occlusion.
The program according to the embodiment is a program for causing, for example, a CPU, a DSP, an AI processor, and the like, or an information processing device including the CPU, the DSP, the AI processor, and the like, to execute the processing illustrated in
That is, the program of the embodiment is a program causing an information processing device in an image creation system to execute processing including the steps of: generating estimation information regarding a subject on the basis of at least one of a captured image or sensor information; generating a first three-dimensional model, which is a three-dimensional model of the subject, on the basis of a plurality of pieces of captured image data obtained by simultaneously capturing images from a plurality of viewpoints, and generating a free viewpoint image, which is an image of an arbitrary viewpoint of the subject, using the first three-dimensional model; and generating a three-dimensional image on the basis of the estimation information and a second three-dimensional model, which is a virtual three-dimensional model of the subject.
In addition, as the program according to the embodiment, there is also a program for causing, for example, a CPU, a DSP, an AI processor, and the like, or an information processing device including the CPU, the DSP, the AI processor, and the like, to execute the processing illustrated in
That is, the program according to the embodiment is a program for causing an information processing device in an image creation system to execute processing of generating estimation information regarding a subject on the basis of at least one of a captured image or sensor information, generating a three-dimensional model of the subject on the basis of a plurality of pieces of captured image data obtained by simultaneously capturing images from a plurality of viewpoints and generating a free viewpoint image that is an image of an arbitrary viewpoint for the subject by using the three-dimensional model, generating a three-dimensional image on the basis of the estimation information and the three-dimensional model of the subject, and generating an output image on the basis of the free viewpoint image and the three-dimensional image.
With such a program, a device configuring the image creation system 300 or 301 according to the embodiment can be realized in, for example, a computer device, a mobile terminal device, or another device capable of executing information processing.
Such a program can be recorded in advance in an HDD as a recording medium built in a device such as a computer device and the like, a ROM in a microcomputer having a CPU, and the like.
Alternatively, furthermore, the program can be temporarily or permanently stored (recorded) in a removable recording medium such as a flexible disk, a compact disc read-only memory (CD-ROM), a magneto optical (MO) disk, a digital versatile disc (DVD), a Blu-ray disc (registered trademark), a magnetic disk, a semiconductor memory, or a memory card. Such a removable recording medium can be provided as what is called package software.
Furthermore, such a program may be installed from the removable recording medium into a personal computer and the like, or may be downloaded from a download site through a network such as a local area network (LAN) or the Internet.
Furthermore, such a program is suitable for providing the information processing device 70 configuring the image creation system 300 or 301 according to the embodiment in a wide range. For example, the program is downloaded to a mobile terminal device such as a smartphone, a tablet, and the like, an imaging device, a mobile phone, a personal computer, a game device, a video device, a personal digital assistant (PDA), and the like, so that the smartphone and the like can be caused to function as the information processing device 70 configuring the image creation system 300 or 301 of the present disclosure.
Note that, the effects described in the present specification are merely examples and are not limited, and other effects may be provided.
Note that the present technology can also have the following configurations.
Number | Date | Country | Kind |
---|---|---|---|
2021-195212 | Dec 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/042999 | 11/21/2022 | WO |