The present disclosure relates to a moving image generation apparatus, moving image generation method, program, and recording medium.
A technique for creating camerawork of a virtually presenting camera (hereinafter, referred to as a virtual camera) with respect to a panoramic image such as a spherical image to generate a moving image having a normal angle of view is known. The camerawork is a camera parameter (a position, an orientation, a focal length, or the like) of the virtual camera that changes in time series. An information processing device or the like can display a moving image when a panoramic image is viewed from various viewpoints by camerawork.
Patent Document 1 discloses a technique for creating one moving image from a moving image captured by a multi-camera.
Patent Document 2 discloses a technique for generating a moving image in which a subject is tracked from a panoramic image.
Patent Document 3 discloses a technique for presenting a field of view as a moving image from a viewpoint moving along a predetermined route by cutting out a moving image from an omnidirectional moving image.
Patent Document 4 discloses a technique for extracting an important area from one spherical image to generate a moving image in a form of transiting the important region.
These techniques are used to generate a moving image with respect to a panoramic moving image by treating the panoramic moving image as a set of multiple panoramic still images to generate a normal moving image with respect to multiple panoramic images.
However, in a moving image having a normal angle of view created from a conventional panoramic image, it is necessary to improve the quality with respect to the camerawork from the viewpoint of a goodness-of-fit index with respect to each panoramic image and the correlation between the panoramic images.
In consideration of the above, an object of the present invention is to provide a moving image generation apparatus capable of generating a high-quality moving image having a normal angle of view from a panoramic image.
In order to achieve the above objectives, the present invention relates to a moving image generation apparatus including a goodness-of-fit-index calculation unit configured to calculate goodness-of-fit-indices of camerawork types of a virtual camera with respect to the plurality of panoramic images, a type allocation unit configured to determine an allocation of the camerawork types to the plurality of panoramic images based on the goodness-of-fit-indices and a frequency of occurrence of each camerawork type, and a moving image encoding unit configured to create the moving image having the normal angle of view from the plurality of panoramic images with the allocated camerawork types.
A moving image generation apparatus capable of generating a high-quality moving image having a normal angle of view from a panoramic image can be provided.
Hereinafter, as an example of embodiments for implementing the present invention, a moving image generation apparatus and moving image generation method performed by the moving image generating apparatus will be described.
Terms used in the present embodiment will be described first.
The term “camera parameters” refers to information that determines the viewpoint of a virtual camera in the three-dimensional space, and specifically to, for example, one or more of a position, an orientation, a focal length, an aspect ratio, and a lens distortion of a virtual camera.
The term “camerawork” refers to values of camera parameters arranged in time series.
The term “virtual camera” does not refer to a camera that is present in reality but refers to a camera that is virtually present and generates a capture image according to defined camera parameter values.
The term “goodness-of-fit index” refers to an index that reflects the aesthetic property of a corresponding image, the recognizability of a target object, or the like, and that is subjectively set by a person. The goodness-of-fit index preferably indicates a higher value as the person finds the image more suitable.
The conventional technique relates to a method of generating a moving image with respect to a panoramic moving image by treating the panoramic moving image as a set of multiple panoramic still images to generate a normal moving image with respect to multiple panoramic images.
However, since scenes of the multiple panoramic images are not continuous, it is difficult for the moving image generation apparatus to adopt the conventional moving image generation method with respect to a panoramic moving image, which assumes a smooth change of the image between frames.
Further, there is a method of applying a conventional method of generating a moving image with respect to a single panoramic still image to each image and connecting the generated moving images. However, in this method, because each panoramic image is processed independently, a monotonous moving image may be generated due to similar camerawork being continuous.
In consideration of the above, in the present embodiment, when generating virtual camerawork with respect to a panoramic image to generate a moving image having a normal angle of view, similar camerawork types are prevented from being continuous by considering the goodness-of-fit index with respect to each panoramic image and the correlation between the panoramic images. Accordingly, a high-quality moving image having a normal angle of view can be generated from the panoramic image.
As a first embodiment, an example in which a moving image generation apparatus adaptively allocates multiple fixed camerawork to each panoramic image to generate one moving image will be described. First, the preconditions will be described when describing the processing of the moving image generation apparatus.
The term “panoramic image” refers to an image captured at a wide angle, and the term “captured at a wide angle” refers to the image being captured wider than the angle of view of the output moving image. For example, in the present embodiment, even when the angle of view of the original image is 60 degrees, if the angle of view of the output moving image is less than 60 degrees, the original image is regarded as a panoramic image. Further, a spherical image taken in all directions of 360 degrees horizontally and 180 degrees vertically is also the panoramic image.
It is assumed that lens distortion and internal parameters of the camera that captured the panoramic image have been calibrated by another means, and a relative projection direction of each pixel of the panoramic image in the three-dimensional space is known. At this time, the panoramic image can be re-projected onto the two-dimensional unit sphere. Formally, the projection direction of pixel i is (xi, yi, zi)∈R3 (where xi2+yi2+zi2=1), and the pixel value of pixel i may be assigned to the position of the two-dimensional unit sphere (xi, yi, zi). Although positions of the pixels are discrete, the moving image generation apparatus can allocate the pixel values continuously on the two-dimensional unit sphere by interpolating using the nearest neighbor method, the bilinear method, the bicubic method, or the like. This two-dimensional unit sphere is hereinafter referred to as an “image sphere”.
Next, the camerawork is defined. The term “camerawork” refers to time series data of parameters of the virtual camera that converts a panoramic image into a partial perspective projection image.
Further, as illustrated in
The CPU 11 centrally controls the operation of the moving image generation apparatus 100. The CPU 11 executes various control programs stored in the ROM 12 by using a predetermined area of the RAM 13 as a workspace to implement various functions of the moving image generation apparatus 100. Specific details of the functions of the moving image generation apparatus 100 will be described later.
The ROM 12 is a nonvolatile memory (non-rewritable memory) that stores programs and various kinds of setting information relating to the moving image generation apparatus 100. The RAM 13 is, for example, a storage device such as a synchronous dynamic random access memory (SDRAM). The RAM 13 functions as a workspace of the CPU 11 and temporarily stores programs and various kinds of data.
The input device 14 is a peripheral device used for accepting an input (an operation on a keyboard or a mouse, a voice-based operation, or the like) from a user. The display device 15 is a component or external peripheral device that displays various kinds of information relating to the moving image generation apparatus 100. The display device 15 is, for example, a liquid crystal display. Note that a touch panel to which the display device 15 and the input device 14 are integrated together may be used. The connection I/F 16 is an interface that connects the moving image generation apparatus 100 to an external device. For example, the connection I/F 16 may be a general-purpose interface such as a Universal Serial Bus (USB) interface or a communication interface that enables wired or wireless communication with an external device.
Next, the functions of the moving image generation apparatus 100 according to the present embodiment will be described with reference to
A storage device 20 illustrated in
Since the functions of the moving image generation apparatus 100 correspond to respective steps of a flowchart illustrated in
In processing step S101, the image acquisition unit 101 acquires multiple panoramic images to be processed, from the storage device 20.
In processing step S102, the importance calculation unit 102 calculates an importance of each region in the obtained panoramic image. The importance calculation unit 102 preferably calculates the importance for each pixel, but the importance may be calculated with sub-pixel accuracy or may be calculated for each specific region.
The importance refers to an index representing the importance of a viewer's viewing the area, which is approximately calculated by the following method. One method is a method using visual prominence detection. The visual prominence detection includes a method of stacking individual rules bottom-up, such as setting high prominence for an edge or an isolated point, and a top-down estimating method from input images by using a machine learning method such as a neural network or support vector machine. Any method may be used in the present embodiment. Also, as an alternative to the visual prominence, a technician may set the object category with high importance in advance (for example, a person, a face, an animal, a car, or the like) and determine the region in which the object is detected by an object detection algorithm to be high importance. Alternatively, the importance calculation unit 102 may estimate a layout of the scene in the panoramic image, and estimate that the importance is high in a direction in which the radial composition, the rule of thirds, the horizontal composition, or the like of the scene is obtained. Alternatively, the importance calculation unit 102 may use attention obtained by an attention mechanism of the neural network as the importance. The attention mechanism refers to a mechanism for learning the relationship between elements or points of caution introduced in the Encoder-Decoder model mainly for the purpose of machine translation and image processing.
Note that the above-mentioned visual prominence detection, object detection, layout estimation, and attention mechanism can also be used in combination. Importance I(v)∈R of the direction v∈S2 (S2 is the two-dimensional unit sphere) can be obtained through this step. In the following, I: S2→R is called an importance distribution I.
1. The importance distribution includes a combination of multiple probability distributions.
2. The individual probability distributions that make up the importance distribution are represented as element distributions.
In processing step S103, the goodness-of-fit index calculation unit 103 calculates the goodness-of-fit index of the camerawork type prepared in advance from the importance distribution I of the panoramic image.
Each of these camerawork types has different features. (a) is suitable for moving image generation for horizontal wide-view, and (b) and (c) are suitable for moving image generation for overlooking a vertical wide field of view.
The goodness-of-fit index calculation unit 103 calculates the goodness-of-fit index of each camerawork type with respect to each panoramic image according to the importance distribution I. As a specific example, the importance distribution I can be modeled by the mixed von Mises-Fisher distribution, and the goodness-of-fit index of the camerawork type can be calculated from the parameters of the model. The mixed von Mises-Fisher distribution is a probability distribution having the probability density function of Math. 1 for a random variable v∈S2.
Each coefficient is as follows.
In the mixed von Mises-Fisher distribution, {αk, μk, κk}Kk=1 is the parameter.
The goodness-of-fit index calculation unit 103 obtains this parameter by maximum likelihood estimation with respect to the importance distribution I. Specifically, the goodness-of-fit index calculation unit 103 obtains {αk, μk, κk}Kk=1 that maximizes the log likelihood of Math. 3.
The maximization of the log likelihood of Math. 3 can be efficiently calculated by the Expectation Maximization (EM) algorithm. In such a way, the goodness-of-fit index calculation unit 103 calculates the goodness-of-fit index of each camerawork type from {αk, μk, κk}Kk=1 obtained by the above-described way. For example, the goodness-of-fit index of the horizontal movement camerawork type (a) is the goodness-of-fit index obtained by subtracting the average of the inner products of the mean direction μk and the gravity direction g∈S2 from 1 (the value increases as the distribution of importance in the horizontal direction is wider). In stereographic projection rotation in the direction of gravity (b), the goodness-of-fit index is the average of the inner products of the mean direction μk and the direction of gravity g∈S2 (the value increases as the element distribution in the direction of gravity is wider). In stereographic projection rotation in the direction of reverse gravity (c), the goodness-of-fit index is the average of the inner products of the mean direction μk and the direction of reverse gravity −g∈S2 (the value increases as the element distribution in the direction of gravity is wider).
The method of calculating the goodness-of-fit index is not limited to these, and any method may be used as long as the camerawork type can be evaluated in a form that follows the distribution of importance. The goodness-of-fit index of the mth camerawork type with respect to the nth panoramic image obtained as described above is represented as Dnm.
In processing step S104, the camerawork type is allocated to each panoramic image using the goodness-of-fit index {Dnm}N,Mn=1, m=1 of the camerawork type calculated by the type allocation unit 104. Here, N is the number of panoramic images, and M is the number of camerawork types. If a number of camerawork occurrences of the same type appear, the whole moving image becomes difficult to change. Therefore, the technician determines the minimum number B of occurrences (frequency of occurrence of each type of camerawork) of each type of camerawork in advance.
By optimizing the goodness-of-fit index with respect to each image after considering the frequency of occurrence of camerawork, the camerawork type suitable for each image can be allocated without being biased toward a particular camerawork type. For example, although Image 1 is suitable for camerawork type A when viewed alone, Image 2 has a higher goodness-of-fit index for camerawork type A than Image 1. Therefore, in the present embodiment, an interaction such as the camerawork type A being allocated to Image 2 having a higher goodness-of-fit index and the camerawork type B being allocated to Image 1 is performed. Accordingly, the same camerawork types are not continuous. The goodness-of-fit index with respect to each panoramic image and the correlation between the panoramic images can be considered.
Then, the type allocation unit 104 requests an allocation of the camerawork type to the panoramic image such that the sum of the goodness-of-fit indices is maximized. The allocation can be formulated as the next integer programming problem by introducing a variable τnm that takes 1 when assigning the mth camerawork type to the nth panoramic image and 0 otherwise (Math. 4).
The Branch-and-Bound method and metaheuristics (simulated annealing, genetic algorithms, tabu search, or the like) can be used to solve the integer programming problem. As a special case, when N=M and B=1, the exact solution can be obtained by the Hungarian algorithm.
In processing step S105, according to the camerawork specified by the set camerawork type, the moving image encoding unit 105 creates a moving image of partial perspective projection from each panoramic image, and encodes and outputs the moving image as one moving image. As described above, a moving image having a high-quality normal angle of view can be generated from panoramic images.
Hereinafter, variations suitable for the first embodiment will be described.
In the parameter estimation of the mixed von Mises-Fisher distribution in the processing step S103, the estimation result can be stabilized by using the Bayes estimator. That is, the goodness-of-fit index calculation unit 103 sets a prior distribution with respect to the parameter {αk, μk, κk}Kk=1 to obtain the posterior distribution for the parameter. An approximation calculation can be implemented by using the variational Bayes estimator. Further, the goodness-of-fit index of the camerawork type can be calculated directly from the importance distribution without going through the parameter estimation of the distribution. For example, in the case of the horizontal movement camerawork type (a), the goodness-of-fit index calculation unit 103 may calculate the goodness-of-fit index by integrating the importance I(v) of the direction v∈S2 using the weight related to the inner product with the gravity direction g∈S2 as in Math. 5.
[Math. 5]
D(I)=∫S
The same idea can be used for the camerawork types (b) and (c) of the stereographic projection rotation in the (reverse) gravity direction, and the importance may be integrated by weighting with the inner product in the (reverse) gravity direction.
In processing step S103, a weight {wnm∈R}N,Mn=1, m=1 may be introduced into the evaluation function to adjust the selected camerawork type. Specifically, the evaluation function of Math. 4 is replaced with Math. 6.
According to the evaluation function of Math. 6, by adjusting the weight {wnm∈R}N,Mn=1, m=1, the frequency of occurrence of a specific camerawork type can be increased, or the frequency of occurrence of a camerawork type can be changed according to the order of occurrence of images. With regard to a constraint, adding the constraint of Math. 7 so that the same camerawork types do not continue is also useful.
[Math. 7]
τnm+τn+1,m≤1(n=1,2, . . . , N−1; m=1,2, . . . , M) (7)
Further, in the above example, the goodness-of-fit index calculation unit 103 is heuristically designed to calculate the goodness-of-fit index. However, the goodness-of-fit index calculation function may be obtained by machine learning because manually designing the goodness-of-fit index calculation function takes time and effort and also the accuracy is limited. The technician presents the examinee with a moving image generated by a set of pre-created images and the camerawork type. A dataset with multiple groups {image, camerawork type, and goodness-of-fit index} is constructed by asking the examinee to provide a value for the goodness-of-fit index. Then, a method of learning a regression model D for estimating the goodness-of-fit index from {image, camerawork type} can be obtained. For the regression, linear regression, logistic regression, support vector regression, gradient boosting, neural network, or the like may be used.
According to the first embodiment described above, by allocating a camerawork type to each panoramic image based on the goodness-of-fit index and the frequency of occurrence of the camerawork type, a moving image by various camerawork types can be generated with respect to multiple panoramic images.
Further, the moving image generation apparatus incorporates the frequency of occurrence of the camerawork type into the constraints or the evaluation function and determines the allocation of the camerawork type to each panoramic image such that the sum of the goodness-of-fit indices of the camerawork types is maximized. As a result, suitable camerawork can be generated with respect to each panoramic image while preventing being biased toward a particular camerawork type.
Further, by calculating the goodness-of-fit index of the camerawork type according to the importance distribution in the image, camerawork that is adaptive to the content of the image can be generated.
As a second embodiment, a moving image generation apparatus that defines a camerawork type by classifying a set of parameters given to a camerawork generation module (camerawork generation unit 205 which will be described later) to generate one moving image from multiple panoramic images will be described.
Although camerawork for a camerawork type is fixed to one in the first embodiment, in the present embodiment, different camerawork is generated according to the content of the panoramic image even if the camerawork types are the same.
In processing step S203, the goodness-of-fit index calculation unit 203 calculates the goodness-of-fit index of the camerawork type prepared in advance from the importance distribution I of the panoramic image. A parameter (hereinafter, referred to as a generation parameter) in the camerawork generation unit 205 that generates camerawork from the importance distribution is defined in the camerawork type of the second embodiment.
Different camerawork is generated for the same importance distribution due to the difference in generation parameters. The generation parameter includes the number of intermediate compositions, a method of path interpolation, acceleration/deceleration rules, and the like.
(d) The goodness-of-fit index calculation unit 203 generates a path in which a minute variation is given to one intermediate composition, and moves the camera on the path at a constant speed.
(e) The goodness-of-fit index calculation unit 203 generates a movement path by linear interpolation for two intermediate compositions, and moves the virtual camera on the path with acceleration/deceleration.
(f) The goodness-of-fit index calculation unit 203 generates a movement path by spline interpolation for three intermediate compositions, and moves the virtual camera on the path at a constant speed.
The goodness-of-fit index calculation unit 203 calculates the goodness-of-fit index from the importance distribution with respect to these camerawork types. First, the goodness-of-fit index calculation unit 203 models the importance distribution with the mixed von Mises-Fisher distribution in the same manner as in processing step S103, and obtains a parameter {αk, μk, κk}Kk=1 by maximum likelihood estimation.
Here, the number of element distributions K is adjusted to the maximum number of 3 for the number of intermediate compositions of the assumed camerawork type.
With respect to the camerawork type (d), the difference between the maximum value of the mixing ratio {αk}Kk=1 and the average of other values is the goodness-of-fit index. That is, the goodness-of-fit index becomes higher as the importance is concentrated on one element distribution.
With respect to the camerawork type (e), the difference between the average and the minimum value of the mixing ratio {αk}Kk=1 is the goodness-of-fit index. That is, the goodness-of-fit index becomes higher as the importance is concentrated on the two element distributions.
With respect to the camerawork type (f), the variance of the mixing ratio {αk}Kk=1 is σ2 and exp (−σ2) is the goodness-of-fit index. That is, the goodness-of-fit index becomes higher as the dispersion of the mixing ratio is small.
The goodness-of-fit index of the mth camerawork for the nth panoramic image obtained as described above is Anm.
In processing step S204, the camerawork is allocated to each panoramic image using the goodness-of-fit index {Dnm}N,Mn=1, m=1 of the camerawork calculated by a type allocation unit 204. This processing is the same as processing step S104 in the first embodiment.
In processing step S205, the camerawork generation unit 205 generates camerawork with respect to each panoramic image based on the allocated camerawork type. Herein, an example will be described in which the camerawork generation unit 205 generates camerawork using a parameter {αk, μk, κk}Kk=1 of the mixed von Mises-Fisher distribution obtained in processing step S202.
That is, the camerawork generation unit 205 generates the camerawork by inputting a parameter related to the importance distribution of each region of the panoramic image and the panoramic image. The camerawork generation unit sets a parameter according to the camerawork type allocated by the type allocation unit.
With regard to the camerawork type (d), the camerawork generation unit 205 first selects the element distribution k that maximizes the mixing ratio {αk}Kk=1. The mean direction μk of the element distribution is defined as an optical axis direction c of the virtual camera, and the camerawork generation unit 205 determines an angle of view γ of the virtual camera from the degree of concentration κk. For example, Math. 8 may be used as a formula for calculating the angle of view γ of the virtual camera from the degree of concentration κ.
η∈(0,1) is a hyperparameter. The η becomes greater as the the γ is greater, which is γ→0 (κ→0) and γ→2π(η→1). This formulation is derived from the relational expression of the cumulative density function with respect to the central angle of the von Mises-Fisher distribution.
The camerawork generation unit 205 gives small variations in the optical axis direction c and the angle of view γ to obtain (c1, γ1) and (c2, β2). Then, the camerawork generation unit 205 linearly interpolates these two points to generate the camerawork paths c(s) and γ(s). That is, the optical axis direction and the angle of view form a pair, and the camerawork generation unit 205 generates camerawork in which the parameter of the virtual camera transits between different pairs of optical axis directions and angles of view.
s∈[0,1]. In the camerawork type (d), since the virtual camera moves at a constant speed, s is set as in Math. 10 for the times t=0, 1, . . . , T.
Next, with respect to the camerawork type (e), the camerawork generation unit 205 first selects two element distributions k1 and k2 from the higher mixing ratio {αk}Kk=1. The camerawork generation unit 205 sets mean directions μk1 and μk2 of each element distribution as optical axis directions c1 and c2 of the virtual camera, and calculates angles of view γ1 and γ2 of the virtual camera from the degree of concentration κK1 and κK2 by Math. 8.
The intermediate composition 1 is assumed to be (c1, γ1) and the intermediate composition 2 is assumed to be (c2, γ2). The camerawork generation unit 205 uses Math. 9 to generate a path that transits between the two intermediate compositions by linear interpolation. Next, the camerawork generation unit 205 determines s with respect to the time t. Here, the camerawork generation unit 205 sets s as illustrated in Math. 11 by using a minimum jerk model as an acceleration/deceleration model.
The minimum jerk model is a model that generates an orbit between two points so that the integral value of the square of the jerk, which is the derivative of acceleration, is minimized. The minimum jerk model is known to reproduce the movement of the human hand adequately. The acceleration/deceleration model is not limited to the minimum jerk model, and any model may be used.
With respect to the camerawork type (f), the camerawork generation unit 205 sets the mean direction {μk}Kk=1 of each element distribution as an optical axis direction {ck }Kk=1 of the virtual camera and calculates the angle of view {γk}Kk=1 of the virtual camera from the degree of concentration {κk}Kk=1 by the Math. 8. As intermediate compositions of these, the camerawork generation unit 205 generates a path of a camera parameter passing through the intermediate composition by spline interpolation, and the virtual camera is assumed to move on the path at a constant speed.
In processing step S206, according to the set camerawork, a moving image encoding unit 206 creates a moving image of partial perspective projection from each panoramic image, and encodes and outputs the moving image as one moving image.
Hereinafter, variations suitable for the second embodiment will be described.
As in the first embodiment, it is possible to apply the variational Bayesian method to the parameter estimation of the mixed von Mises-Fisher distribution, directly calculate the goodness-of-fit index from the importance distribution, and weight the evaluation function.
Further, the fixed camerawork of the first embodiment may be mixed with the camerawork of the second embodiment. Various camerawork types can be assumed depending on the combination of the number of intermediate compositions, a method of path interpolation, and acceleration/deceleration rules. Any other camerawork type can be considered (however, inconvenience occurs when the number of camerawork types is large, and it is preferable to apply the following third embodiment to solve the inconvenience).
According to the present embodiment, various camerawork can be generated for multiple panoramic images according to the content of the images.
The camerawork generation unit generates camerawork by inputting parameters related to the importance distribution of each region of the panoramic image and the panoramic image, thereby camerawork that is adaptive to the content of the image can be generated while preventing being biased toward a particular camerawork type.
By considering the importance distribution in a panoramic image as a mixture distribution (element distribution) of a probability distribution with multiple distributions for elements and generating camerawork according to the parameter of the element distribution, camerawork that is robustly adaptive to the content of the image can be generated.
As a third embodiment, when the number of camerawork types is very large relative to the number of input images or when there are infinite camera types as an extreme example, a moving image generation apparatus that allocates a camerawork type to each panoramic image to generate and output camerawork will be described.
In such a case, there may be a disadvantage that a moving image of similar camerawork continuing is generated because similar camerawork types are allocated to different panoramic images.
In the third embodiment, the moving image generation apparatus defines a distance between camerawork types and incorporates the distance into a constraint or an evaluation function to generate various types of camerawork.
In processing step S303, the goodness-of-fit index calculation unit 303 calculates the goodness-of-fit index of the camerawork type from the importance distribution I of the panoramic image. Here, the camerawork type (f) of the second embodiment is used as an example. A moving image length T in Math. 10 is used as a variable to classify the camerawork type (for example, T=300 and T=500 are different camerawork types).
When classifying camerawork types, the extent of the importance distribution is considered. For example, as in processing step S103, the goodness-of-fit index calculation unit 103 obtains the parameter {αk, μk, κk}Kk=1 of the mixed von Mises-Fisher distribution with respect to the importance distribution I by maximum likelihood estimation. The variance of {μk}Kk=1 is the goodness-of-fit index of the camerawork type. In the following, the goodness-of-fit index of the camerawork type m for the nth panoramic image is represented as Dnm (in the above example, it is assumed that m=T).
In processing step S304, the camerawork is allocated to each panoramic image using the goodness-of-fit index Dnm of the camerawork calculated by a type allocation unit 304. Here, the type allocation unit 304 uses a distance d (m, m′) between the camerawork types m, m′. This distance is a constraint. Typically, L2, L1, or the like may be used.
The camerawork type allocated to the nth panoramic image is mn, and the optimization problem of Math. 12 is constructed.
The type allocation unit 304 solves the optimization problem to obtain the allocation {mn}Nn=1. As a solution to the optimization problem, if M=T is used in the above example, it becomes an integer programming problem and may be solved by a Branch-and-Bound method or metaheuristics (may be solved with the following method with linear relaxation). If mn is a real number, the problem is solved by constrained nonlinear programming such as the penalty function method, Lagrange's undetermined multiplier method, and generalized reduced gradient method.
In processing step S305, a camerawork generation unit 305 generates camerawork with respect to each panoramic image based on the allocated camerawork type. This processing is the same as processing step S205 in the second embodiment.
In processing step S306, according to the set camerawork, a moving image encoding unit 306 creates a moving image of partial perspective projection from each panoramic image, and encodes and outputs the moving image as one moving image.
With regard to a constraint of processing step S304, instead of using a constraint with the distance between camerawork types, the distance can be incorporated into the evaluation function as illustrated in Math. 13.
In Math. 13, λ is a hyperparameter that adjusts the balance between the goodness-of-fit index and the distance between camerawork types. The optimization problem may be solved by an unconstrained nonlinear programming method (steepest descent method, Newton's method, conjugate gradient method, downhill simplex method, and the like) to obtain the camerawork type allocation.
Further, in the third embodiment, the moving image length T is used as the camerawork type, but not limited to this. Anything can be used as long as the goodness-of-fit index of the camerawork type and the distance between the camerawork types can be defined. For example, the number of intermediate compositions or the hyperparameter η that determines the angle of view conversion of the Math. 8 can be used as the camerawork type, also, the number of intermediate compositions and the hyperparameter η may be used in combination.
According to the present embodiment, it is possible to generate diverse and high-quality camerawork for multiple panoramic images even when many camerawork types exist. The high-quality means that the goodness-of-fit index with respect to each panoramic image is high, and the quality is high from the perspective of correlation between the panoramic images.
Further, the moving image generation apparatus incorporates the distance between the camerawork types into the constraints or the evaluation function and determines the allocation of the camerawork type to each panoramic image such that the sum of the goodness-of-fit indices of the camerawork types is maximized. As a result, suitable camerawork can be generated with respect to each panoramic image while preventing being biased toward similar camerawork types.
As a fourth embodiment, an example in which a Conditional Generative Adversarial Network (hereinafter, abbreviated as CGAN) is used for camerawork generation will be described. In the framework of CGAN, a panoramic image x and a latent variable z can be input to a generating function G to obtain camerawork y=G(x, z). Since the variation of the camerawork can be obtained by the latent variable z, the latent variable z can be regarded as the camerawork type in the fourth embodiment.
Further, in CGAN, a discriminant function D of the camerawork is acquired in the process of learning, and the value of D(x, y) increases as the generated camerawork y is closer to the camerawork included in the training data. Note that the D(x, y) is regarded as the goodness-of-fit index in the fourth embodiment.
As the distance between camerawork types, the distance in the space of the latent variable z is used. In such a way, even when CGAN is used as a camerawork generation method, the camerawork can be generated in consideration of the goodness-of-fit index with respect to each panoramic image and the correlation between the panoramic images.
In process step S403, a type allocation unit 404 allocates camerawork with respect to each panoramic image. Similar to the third embodiment, allocating the camerawork type is performed by solving the optimization problem related to the distance between camerawork types. In this regard, in the fourth embodiment, the optimization problem of Math. 14 is constructed.
In Math. 14, it is assumed that a set of images is X, a set of camerawork types is Z⊂RDz, a set of camerawork is Y, xn∈X, zn∈Z (n=1, 2, . . . , N), and a generating function G: X×Z→Y and a discriminant function D: X×Y→[0,1] are obtained in advance by adversarial training in CGAN.
The discriminant function D outputs a value close to 1 if a pair of input image and camerawork is close to the training data, and close to 0 if the pair is far from the training data. This is regarded as the goodness-of-fit index of the camerawork type.
Further, a distance d in the space of the latent variable (camerawork type) is set as a constraint. The type allocation unit 404 obtains the allocation of the camerawork type to the panoramic image that maximizes the sum of the goodness-of-fit index, with the distance d as a constraint.
Random search, hill climbing, or metaheuristics may be used to solve the optimization problem. Since the calculation cost is high to calculate the goodness-of-fit index for the combination of the panoramic image and the camerawork type in advance as in the first, second, and third embodiments (in the first place, if the concentration of Z is infinite, the calculation is impossible), the goodness-of-fit index calculation unit 403 calculates D(xn, G(xn, zn)) whenever the value of the discriminant function D is required in the optimization process.
In processing step S404, the camerawork generation unit 405 generates camerawork with respect to each panoramic image based on the allocated camerawork type. That is, G(xn, zn) (n=1, 2, . . . , N) is calculated with respect to zn obtained by optimization.
In processing step S405, according to the set camerawork, a moving image encoding unit 406 creates a moving image of partial perspective projection from each panoramic image, and encodes and outputs the moving image as one moving image.
Any configuration can be used for the generating function G and the discriminant function D. As a configuration example of the generating function G, after combining the camerawork types in the channel direction of the input image, a feature value may be calculated by processing with a Convolutional Neural Network (CNN), and the camerawork that is time series data may be output by Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer.
The discriminant function D uses image data and time series data as input, and extracts the feature value using CNN for the image data and RNN, GRU, LSTM, and Transformer for the time series data. Then, the discriminant function D preferably combines the feature value and then performs the treatment with the whole combined layer.
In the above example, Euclidean space is assumed for the set of camera parameter types Z, but it is not limited to this. When the set Z is composed of a small number of countable elements, the goodness-of-fit index may be calculated in advance before the optimization calculation as in the first, second, and third embodiments.
According to the fourth embodiment, by using CGAN to generate a moving image for multiple panoramic images, CGAN can construct a camerawork generating function and a goodness-of-fit index calculation function (discriminant function) in a data-driven manner. Therefore, if a sufficient amount of training data exists, diverse and high-quality camerawork can be generated.
While the embodiments of the present disclosure have been described above, the present disclosure is not limited to such embodiments. Thus, various modifications and replacements may be made within the scope not departing from the gist of the present disclosure.
For example, the present embodiments are applicable, as well as to the generation of a 3D CG moving image, to a virtual reality system.
A moving image generated according to the present embodiments are applicable to advertisements. In addition, the present embodiments allow a viewer to effectively browse merchandise and services by distributing the moving image at various electronic commerce (EC) sites.
In addition, in the present embodiments, some processing is implemented by machine learning. Machine learning is a technique allowing a computer to obtain a learning ability such as that of a person and is a technique for autonomously generating an algorithm used for determination such as data classification from training data that is obtained in advance and for applying the algorithm to new data to perform prediction. The machine learning method may be any of supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and deep learning, or any combination of these. That is, any machine learning method may be used.
The configuration examples illustrated in
Each of the functions of the above-described embodiments may be implemented by one or more pieces of processing circuitry. Here, the term “processing circuitry” used herein refers to a processor that is programmed to carry out each function by software such as a processor implemented by an electronic circuit, or a device such as an application specific integrated circuit (ASIC), digital signal processor (DSP), field programmable gate array (FPGA) that is desired to carry out each function described above, or a conventional circuit module.
The above-described embodiments are illustrative and do not limit the present invention. Thus, numerous additional modifications and variations are possible in light of the above teachings. For example, elements and/or features of different illustrative embodiments may be combined with each other and/or substituted for each other within the scope of the present invention.
The present application is based on and claims priority to Japanese Patent Application No. 2020-217197, filed Dec. 25, 2020. The contents of which are incorporated herein by reference in their entirety.
[Patent Literature 1] Japanese Patent No. 6432029
[Patent Literature 2] Japanese Laid-Open Patent Application Publication No. 2004-241834
[Patent Literature 3] Japanese Patent No. 5861499
[Patent Literature 4] Japanese Laid-Open Patent Application Publication No. 2018-151887
Number | Date | Country | Kind |
---|---|---|---|
2020-217197 | Dec 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/047061 | 12/20/2021 | WO |