The present invention relates to a method and a device for deforming shapes in images.
Shape deformation is a classical problem in computer graphics and animation that comes in a variety of forms and settings. Essential to all variants is the requirement to compute a map between different spaces with some desirable properties. Common choices include maps between curved manifolds embedded in R3, volumetric maps between solid shapes in R3, and mixed dimensional maps where, for example, a curved 2-manifold embedded in R3 is mapped to R2 (a parameterization).
Depending on the precise user requirements, even a simplified setting using planar maps, where both the source and target domains are subsets of R2, is challenging to deal with.
Some requirements such as smoothness are easily achievable while other, such as local injectivity or strict bound on the induced metric distortion are hard, sometimes impossible to guarantee. The goal is to be able to compute a map that adheres to these positional constraints yet at the same time is visually plausible. Plausibility is usually quantified by smoothness and the amount of metric distortion, which can be either minimized on the average, or bounded above by some user-defined amount. Moreover, the map is expected to be locally injective, preventing degeneracies of the image and local overlaps while allowing global ones. These requirements pose a great computational challenge since they usually lead to a nonconvex optimization problem with many degrees of freedom for which finding an optimal solution (or even just a feasible one) is often intractable.
Recent advances in the computation of locally injective and/or bounded distortion maps allow artists to produce high quality results with greater than ever success rates but the performance of current methods is still significantly worse compared to simple linear methods (Weber Ofir, Ben-Chen Mirela, and Gotsman Craig. 2009. Complex Barycentric Coordinates with Applications to Planar Shape Deformation. Computer Graphics Forum 28, 2 (2009), 587-597).
It is therefore an object of the invention, to provide a computationally efficient method and a device for shape deformation of an image that produces visually plausible results.
This object is achieved by a method and a computer program product according to the independent claims. Advantageous embodiments are defined in the dependent claims.
According to the invention, the deformation problem is cast as a nonlinear nonconvex unconstrained minimization problem that is performed in a reduced deformable subspace of harmonic maps. The user is allowed to pick among a variety of smooth energies that measure isometric distortion.
The method according to the invention broadens the scope of previous research on harmonic maps from simply-connected to multiply-connected domains (disks with finite number of holes) such as those appearing in
A custom-made Newton solver is constructed with features efficiently addressing the deformation problem. The quality of the produced maps is superior and comparable to the other methods that operate within the harmonic subspace. However, the inventive method is orders of magnitude faster than existing techniques. The optimization is based on a second-order Newton approach which ensures that the convergence rate is high. In general, a single Newton iteration can be slow due to various reasons such as the need to approximate the Hessian numerically using finite differences (when analytical expressions are unavailable), as well as the need to modify the Hessian such that it becomes positive definite, which is necessary to ensure convergence to a local minimum.
The inventive solver overcomes these issues due to several unique properties, such as having closed-form expressions for the map, and its differentials, and an analytic modification scheme for the Hessian to ensure its positive definiteness. Combining these benefits leads to a Newton iteration that is considerably faster than a standard Newton iteration. Together with the small number of iterations that are typically required for convergence, the total running time of our solver is significantly shorter compared to state-of-the-art solvers. Moreover, every iteration of our solver is embarrassingly parallelized and perfectly suites implementation on massively parallel graphics processing units due to the structure and regularity of the problem. This allows for unprecedented performance.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
These and other aspects and advantages of the present invention will become more apparent when considering the following detailed description of an embodiment of the invention, in connection with the drawing, in which:
According to an embodiment of the invention, an interactive deformation method is described that is driven by user specified positional constraints. That is, the user selects a small set of points in the source domain which is visualized as a textured image, and interactively drags them around to new positions.
According to the present embodiment of the invention, the image deformation map is a harmonic planar map.
A real-valued function u(x,y): Ω→R, is harmonic if it satisfies the Laplace equation uxx+uyy=0. A planar harmonic map is a map ƒ: Ω→R2 where ƒ=[u(x,y),v(x,y)], and the component functions u and v are harmonic. When R2 is identified with the complex plane C using complex notation z=x+iy, one can express f as a complex-valued function ƒ(z)=u(z)+iv(z). Thus a complex valued harmonic function ƒ: Ω⊂C→C can be interpreted as a harmonic planar map. A complex-valued function ƒ=u+iv is said to be holomorphic if it satisfies the Cauchy-Riemann equations. ƒ(z) is called anti-holomorphic if
The Wirtinger derivatives of a complex-valued function are defined as follows: ƒz:=½(ƒx−iƒy) and ƒ
Using the Wirtinger derivatives, the Cauchy-Riemann equations are expressed concisely as ƒ
The Wirtinger derivatives are useful for defining many of the local geometric distortion quantities of a C1 planar map. The local change in area is: det(Jƒ)=|ƒz|2−|ƒ
Isometric and conformal distortion measures are typically formulated in terms of the singular values of Jf, 0≤σ2≤σ1:
σ1(z)=|ƒz|+|ƒ
For orientation preserving locally injective maps, the expression on the right simplifies to σ2(z)=|ƒz|−|ƒ
The invention generalizes existing techniques that are designed for simply-connected domains to multiply-connected domains. The construction is based on the following theorem, proven by the inventors:
Theorem (Harmonic Decomposition). Let Ω be a multiply connected planar domain with N holes K1, . . . , KN, and choose N arbitrary points ρi in Ki. Then, any harmonic map f: Ω→C can be represented as:
where {tilde over (Φ)}, {tilde over (Ψ)}: Ω→C are holomorphic functions, and ω1, . . . , ωN are some complex coefficients.
The harmonic decomposition (3) is unique up to an additive complex constant that can be chosen by setting e.g. {tilde over (Ψ)}(z0)=0 for an arbitrary point z0∈Ω. Theorem Harmonic Decomposition extends a similar result for simply-connected domains for which the summation term in (3) is missing. While {tilde over (Φ)}(z)+
The main theoretical result in [Chen and Weber 2015] is a boundary value characterization of the injectivity and the conformal and isometric distortion of planar harmonic maps on simply-connected domains. We generalize this result to the multiply-connected case:
Theorem (Bounded Distortion). A planar harmonic map ƒ: Ω→C on a multiply-connected domain with exterior boundary curve γ0 oriented counterclockwise and interior boundary curves γ1 . . . γN oriented clockwise, is locally injective with an upper bound k∈[0, 1) on the conformal distortion, a lower bound σ2>0 on the small singular value of the Jacobian, and an upper bound σ1<∞ on the large singular value at every point z in Ω if and only if:
Intuitively, Theorem Bounded Distortion states that in order to induce global distortion bounds, it is sufficient to bound the distortion of a harmonic map on the boundary curves γi as long as (4a) holds. The boundary integral condition (4a) is equivalent to the condition that ƒz is nonvanishing throughout the domain. Namely that ƒz(z)≠0. In contrast to the simply-connected domain, the integral over the exterior boundary γ0 may be nonzero. This gives rise to maps which are not regular homotopic to the identity map but can still be locally injective.
As opposed to [Chen and Weber 2015], the inventive deformation algorithm does not explicitly impose user-defined bounds k, σ1, σ2 on the distortion. Instead, an overall lower average distortion is favored. By using energies that become infinite when the map degenerates at a boundary point, a finite bound on distortion is naturally formed on the boundary. Theorem Bounded Distortion then assures that this bound is also global. At each iteration, the inventive algorithm certifies the map as locally injective by verifying that (4a) holds, as well as the following:
|ƒz(w)|>|ƒz(w)| ∀w∈∂Ω. (5)
How to enforce (4a) and (5) in practice is explained in the following description.
According to the present embodiment, equation (3) is discretized to obtain a finite dimensional space of harmonic maps. The holomorphic functions {tilde over (Φ)} and {tilde over (Ψ)} are discretized by using the Cauchy complex barycentric coordinates [Weber et al, 2009] which are derived by approximating the boundary of the domain using a polygonal shape. Let {circumflex over (P)}={z1, z2, . . . , zm}, zj∈C be the vertices of a multiply-connected planar polygon (the cage). The vertices of the cage are split into sets such that the first set represents the exterior polygon oriented counterclockwise while the following sets represent the interior polygons oriented clockwise.
The holomorphic functions {tilde over (Φ)} and {tilde over (Ψ)} are represented as:
where {tilde over (C)}j(z) is the jth holomorphic Cauchy barycentric coordinate associated with vertex z, and φj, ψj are complex coefficients.
{tilde over (C)}j(z) possess a rather simple closed-form expression (see Weber Ofir. 2010. Hybrid Methods for Interactive Shape Manipulation. Ph.D. Dissertation. Technion—Israel Institute of Technology). While Weber et al. [2009] assumed that {circumflex over (P)} is simply connected, their derivation can be extended to the case of multiple boundary components due to the fact that Cauchy's integral formula still holds. This can be done by integrating the Cauchy kernel over all the boundary components. Practically, representing holomorphic functions on such a domain simply boils down to using additional basis functions that correspond to vertices that lie on the interior boundaries. The complex barycentric map induced by the Cauchy coordinates on multiply-connected domains has a nonempty null space of (complex) dimension 2N which corresponds to a similarity transformation of the vertices of the interior boundary polygons. To remove this ambiguity, the first two complex coefficients of each hole are fixed to zero.
The holomorphic functions {tilde over (Φ)} and {tilde over (Ψ)} in the harmonic decomposition (3) are substituted with the expressions from (6).
The additional term Σi=1NLn|z−ρi| in (3) is substituted with the expression:
where φj, ψj are 2N additional complex coefficients. Splitting ωi into two variables creates some unnecessary redundancy, nevertheless it greatly simplifies the exposition as well as the implementation.
Optionally, this redundancy may be removed by enforcing a constraint φj=, j=m+1, . . . , m+N for each hole. Finally, rearranging terms and denoting n=m+N leads to:
According to the invention, Equation (8) is the closed-form expression for a finite dimensional harmonic subspace with 2n complex variables {φj,j}j=1n. The dimension of the (complex) null space is 4N+N+1, where 4N is due to the 2 complex DOF per hole for each of the two Cauchy barycentric maps (as explained above), N is due to the redundancy introduced by Equation (7) and the term 1 is due to the constant DOF of the harmonic decomposition (3).
By linearity of the Wirtinger operators, the fact that the derivative of the Cauchy coordinates with respect to z is 0, and assuming that φj=, ∀j=m+1, . . . , n the Wirtinger derivatives of the harmonic map, just like the map itself, can be expressed in closed-form:
The derivatives of the Cauchy barycentric coordinates {tilde over (D)}j(z) are holomorphic and so is
hence it is clear that fz and
The main approach in interactive shape deformation is to design and minimize a distortion energy that aggregates a differential quantity that strives to keep the map as-isometric-as-possible. Such differential quantities are minimized when the map is locally isometric. However, since a perfect isometry cannot be obtained in general in the presence of positional constraints, the particular definition of proximity to isometry greatly affects the end result. It is a common practice to measure distortion at a point z as a function E(z)=E(σ1(z), σ2(z)) of the two singular values of the Jacobian matrix Jf(z). Such a function is invariant to rigid motions of both the source and the target domains. E(z) should be minimized (at a single point z) if σ1(z)=σ2(z)=1. Energies that minimize conformal distortion are also popular. Our harmonic subspace contains many pure (with zero distortion) conformal maps, hence, if desired, we can simply restrict the subspace by eliminating the ψj variables and use any of our isometric energies to regularize the result.
Appropriate choices of isometric energy for designing locally injective maps become infinite when the map collapses locally, which serves as a natural barrier term. Since our optimization depends on high order derivatives, we focus on energies which are smooth in our variables φj, ψj. To this end, the distortion measure at z is expressed as a smooth function of |fz|2 and |ƒ
The motivation to use Eexp is to penalize more drastically high values of the distortion measure.
According to the invention, the isometric distortion of a planar harmonic map f is defined as a boundary integral over the pointwise distortion quantity E(w):
E
ƒ=∂ΩE(w)ds. (12)
The superscript f is used to denote the boundary integrated distortion and omitted in order to denote the pointwise quantity. Theorem Bounded Distortion ensures that for any bound on E(w) along the boundary, a global upper bound on σ1(z), and a global lower bound on σ2(z) exist. Hence, a global bound on E(z) is naturally formed. Alternatively, an area integral may be chosen instead of the boundary one.
The gradient and the Hessian of the overall isometric energy are:
∇Eƒ=∂Ω∇E(w)ds. (13)
∇2Eƒ=∂Ω∇2E(w)ds. (14)
The gradient and the Hessian of our energy measures have relatively simple closed-form expressions. Let D=D1, D2, . . . , Dn)∈C1×n be a complex row vector, where Dj is defined as in (9). Bold symbols are used to denote real vectors and matrices. Define the real matrix D (note the bold symbol) as:
The complex Wirtinger derivatives are expressed as 2×1 real vectors:
The gradient of E with respect to the 4n real variables is:
where α1, α2 are real parameters which depend on the particular choice of energy. Table 1 provides the parameters needed to instantiate some particular energies (out of many possible). The expression for the 4n×4n Hessian of E(z) at a single point is:
where K is the a 4×4 real matrix:
The parameters β1, β2, β3 are also energy dependent (see Table 1).
The point-to-point (P2P) metaphor is an intuitive drag-and-drop user-interface that best fits interactive deformation tasks. The user of the inventive method can add or remove (by clicking) positional constraints at any time during interaction. Dragging the P2P handles signals the application to invoke the optimization and to render the updated result. The P2P constraints are incorporated as soft constraints. The P2P energy is defined as:
where pi∈P⊂Ω is the position of the ith handle in the source domain, and qi∈C is its target desired position. The gradient and the Hessian of Ep2pƒ are derived as follows.
The P2P energy at a single handle can be expressed using complex numbers:
E
p2p(pi)=½|ƒ(pi)−qi|2=½|C(pi)φ+−qi|2, (F.1)
where C(pi)=(C1(pi), C2(pi), . . . , Cn(pi))∈C1×n is a complex row vector, where is defined in (8), and φ,ψ∈Cn×1. The complete energy is given by the sum:
Equation (F.1) can be expressed using real numbers:
where Q(pi) is a real matrix defined as:
Ep2p(pi) is a quadratic form with constant PSD Hessian matrix:
∇2Ep2p(pi)=QT(pi)Q(pi)∈4n×4n (F.3)
The gradient of the quadratic form is:
In order to sum the gradients and the Hessians over all the P2P handles P, one constructs a matrix Q by stacking Q(pi):
Finally, the full Hessian and full gradient are given by:
where q is a real column vector stacking the target positions:
q=└Re(q1),Im(q2), . . . ,Re(q|P|),Im(q|P|)┘T∈2|P|×1 (F.8)
The full energy of the unconstrained minimization problem is:
E
Def
ƒ
=E
ƒ
+λE
p2p
ƒ, (20)
where λ is a user-defined weight that balances the two terms.
The Hessian of Ep2pƒ is positive semi-definite (PSD). However, the Hessian of Ef is, in general, not PSD, and neither is the Hessian of EDefƒ (Equation (20)). The key observation here is that the closest (in Frobenius norm) PSD matrix to the Hessian ∇2E(z)∈4n×4n at a single point z, can be expressed in closed-form. With that at hand Equation (14) is substituted with:
∇2Eƒ+=∂Ω∇2E+(w)ds, (21)
where ∇2E+(w) is the closest PSD matrix to ∇2E(w). Since, the integral (or sum) of PSD matrices is PSD (due to the convex cone structure of the PSD set), it is clear that the modified Hessian of the isometric energy (21) is guaranteed to be PSD. Further, the dimension of the null space of (21) in the case of Eiso and Eexp is 2 or 3. Therefore, with 2 or more positional constraints, ∇2Eƒ+ is nonsingular.
This application refers to the present variant of Newton's method that computes the closest PSD matrix to ∇2Eƒ as Newton-Eigen. The modified Newton iterations are dramatically faster to compute. Moreover, throughout extensive experiments, the inventors observed that the local modification leads to iterations which are more effective, where less iterations are required for convergence (
Computing the eigen-factorization of the local Hessian matrix ∇2E numerically is impractical. In order to derive an analytic eigen-factorization of a 4n×4n matrix, it is first observed that each of the nontrivial eigenvectors (with nonzero eigenvalue) of ∇2E can be expressed as a product of
and a corresponding eigenvector of K (Equation (19)). This is due to the fact that B has 4 rows that are orthogonal to each other (easy to verify by computing the inner product of each pair) and all the rows have the same norm. Furthermore, the eigenvalues of K and ∇2E are the same up to a positive scale. Hence, the above problem can be reduced to analytically computing the eigenvalues of a 4×4 matrix. This is still challenging as these are the roots of a 4th order polynomial. Denote the (unsorted) eigenvalues of K as (λ1, λ2, λ3, λ4). It is further observed that λ1=2α1 and λ2=2α2. To see why 2α1 is an eigenvalue, subtract 2α1 from the diagonal of K. The first two rows of K−2α1I are:
ƒz└4β1ƒzT 4β3
where we can see that the 1×4 row vector on the right hand side of (22) is multiplied by two scalars (the elements of fz), hence, these two rows are linearly dependent, meaning that the matrix K−2α1I is singular and 2α1 is a root of the characteristic polynomial. Similarly, it can be shown that 2α2 is an eigenvalue. Knowing that 2α1 and 2α2 are two eigenvalues of K, the other two eigenvalues one can compute by directly solving the quartic characteristic equation. The derivation is long but straightforward, hence omitted. One obtains these expressions:
λ3,4=s1±√{square root over (s22+16β32|ƒz|2|ƒ
where s1,2=α1+2β1|ƒz|2±(α2+2β2|ƒ
The corresponding four eigenvectors are:
The signs of these eigenvalues depend on the particular choice of isometric energy, therefore in the most general case, the 4 eigenvalues are evaluated and negative ones are substituted with 0 to obtain the modified Hessian. For particular energy choices, it is possible to simplify matter even more. For the first two energies listed in Table 1, only λ1=2α1 can be (at times) negative. This allows to directly express the modified matrix. For example, for Eiso, it turns out that λ3, λ4 have quite simple expressions:
λ3=4(1+3(|ƒz|+|ƒ
The spectrum is then sorted as follows: 2α1≤λ3≤2α2≤λ4. With this information at hand, the modification is done by checking for the sign of α1, and if it is negative, K is substituted in (18) with:
To ensure that the map is locally injective at each iteration, it needs to be verified that Conditions (4a) and (5) hold, otherwise one needs to backtrack during line search. As (5) involves infinite number of inequalities: |ƒz(w)|>|ƒ
Condition (5) can be enforced on the entire boundary by enforcing it (individually) on many small boundary segments. For each segment [vi,vi+1] (see
Lƒ
|ƒz(vi)|−|ƒ
or more concisely:
σ2(vi)+σ2(vi+1)≥(Lƒ
It is crucial that the sufficient condition above is as tight as possible, as otherwise the Newton step size may become too small and will stop the Newton iterations prematurely. The condition becomes tighter as the Lipschitz constants and/or the length of the segment 1 decrease. Since denser sampling requires more computations, it is advised to obtain as small as possible Lipschitz constants.
According to the invention, the following Lipschitz constants may be used:
where
d(z) is the distance from a point z to the segment,
Turning now to Condition (4a) requiring that the boundary integral (or equivalently the number of zeros of fz) is 0. The following condition is applicable to multiply-connected domains, and is both necessary and sufficient as long as (27) holds:
Theorem 8.1. Let g(z) be an L-Lipschitz continuous holomorphic function in a neighborhood of a simple open curve with length l and two endpoints vi,vi+1∈C such that:
|g(vi)+|g(vi+1)|>Ll, (28)
then:
where Arg is the principle branch of the complex argument function.
Theorem 8.1 is applicable to fz along any line segment [vi,vi+1] since the inventive algorithm always verifies that (27) holds (which implies (28)). Consequently, under the assumption that (27) holds, fz does not vanish inside the multiply-connected polygon P if and only if:
where the first summation is over the polygonal loops, and the second one is over the segments in each loop.
To conclude, the method first verifies that (27) holds on all the boundary segments and if so, the sum in (30) is simply computed. If it is zero, the map is guaranteed to be locally injective everywhere.
In general, Newton's method searches for a stationary point of a scalar function E(x) of a vector variable x∈Rn by iteratively approximating it using its second order Taylor expansion:
E(x)=E(xk+Δx)≈E(xk)+ΔxT∇E(xk)+½ΔxT∇2E(xk)Δx
where Δx=x−xk∈Rn is the search direction, and ∇E∈Rn and ∇2E∈Rn×n denotes the gradient and the Hessian of E respectively. This second order polynomial in Δx has vanishing derivative when the following linear system is satisfied:
∇2E(xk)Δx=−∇E(xk) (1)
The direction Δx decreases the energy only if the second order polynomial is convex, or equivalently, the Hessian matrix ∇2E is positive definite. Hence, when needed, the Hessian is modified. The solution to the linear system produces the update for the current iteration xk+1=xk+Δx. To ensure convergence (to a local minimum) from an arbitrary starting point, a line search along the search direction Δx should be used to find a step size t that satisfies the Wolfe conditions.
While the map and its differentials possess closed-form expressions, the contour integrals that arise in Equations (12), (13), (14) may not be solved analytically; therefore, the method resorts to numerical integration. Let G={w1, w2, . . . , w|G|}⊂∂Ω an be a set of uniformly distributed samples. Applying the trapezoidal rule to (12), it boils down to a simple sum, due to the uniform sampling:
The gradient of Ef is approximated in the same fashion:
However, it turns out that for the integral approximation of the Hessian, much coarser approximation can be used with negligible influence on the number of Newton iterations (see Figure D.). Hence, the method uses an additional set H of samples which is a subset of G but contains significantly less samples. The Hessian is approximated:
where ∇2E+(wi) is the analytic modification of the pointwise Hessian. We note on a theoretical level, that for any number |H| (as small as it may be), the Newton step produces a descent direction since the matrix ∇2Ef in (33) is guaranteed to be PSD (strict positive definiteness can be ensured by adding E to the diagonal). In the extreme case that H is an empty set, the algorithm degrades to gradient descent. The inventors observed that in practice, choosing |H|=0.1 |G| does not degrade the effectiveness of the Newton iterations and provides a good balance between the time it takes to perform the numerical integration and the other steps of the algorithm. This setting is used in all the results. Moreover, since in practice |H| is much larger than the dimension of our subspace, it is not necessary to add E to the diagonal.
To ensure convergence to a local minimum from an arbitrary starting point the method uses simple backtracking [Nocedal Jorge and Wright Stephen. 2006. Numerical Optimization. Springer New York, Algorithm 3.1] to find a step size that sufficiently decreases the energy (Algorithm 1, Line 17) and in addition ensures that the new map will be locally injective using the above-described strategy (Algorithm 1, Line 18). This requires evaluations of the energy and the Wirtinger derivatives at different step sizes.
The algorithm shown in
The GPU implementation proposed by the invention is designed in a way that utilizes standard libraries as much as possible. The implementation mostly involves only dense linear algebra operations, while the parts that require coding of a specialized kernel are relatively simple, embarrassingly parallel operations. The inventors have utilized the cuBLAS (cuBLAS. 2017. CUDA Basic Linear Algebra Subroutines. http://developer.nvidia.com/cublas. (2017)), and cuSolverDN (cuSOLVER. 2017. CUDA Linear Solvers Library. littvgdevelopor.nvidia.comlcusoiver. (2017)) (BLAS and LAPACK implementations) dense linear algebra libraries that are freely distributed with NVIDIA's CUDA toolkit with double precision accuracy. This not only simplifies the implementation, but also ensures that the code will be optimized for different GPU architectures and will run smoothly on any NVIDIA device. Moreover, as graphics hardware constantly evolves and these libraries get optimized further, the speed of such implementation is expected to automatically improve over time.
Regarding matrix and vector notations, the Wirtinger derivatives at a single point can be expressed concisely in vector form:
ƒz(z)=D(z)φ,
where D(z)=(D1(z), D2(z), . . . , Dn(z))∈C1×n is a complex row vector, while φ=(φ1, φ2, . . . , φn)T and ψ=(ψ1, ψ2, . . . , ψn)T are complex column vectors.
The complex-valued function f and its differentials have to be evaluated at a large collection of points inside the domain. It would be convenient to express these in matrix notations. For example, assume that one wants to evaluate the map f on a set of points V∈Ω for the sake of visualization.
In the complex matrix C(V)∈C|V|×n the ith row of C corresponds to the point vi∈V such that Ci,j=Cj (vi). Similarly, the complex matrix D(V)∈C|V|×n may be used such that Di,j=Dj (vi). Using these notations, one may evaluate the map fat the points of the set V by computing the matrix product:
followed by the computation of the vector sum Φ+
In the preprocessing step (Algorithm 1, line 1), the polygon P is triangulated with a dense triangulation whose vertices form the set V. The triangulation may be used for visualization purposes. Hardware texture mapping may be used to render the final deformed image. The rendering can be performed efficiently on the GPU with extremely high mesh- and texture resolutions without affecting the runtime of the solver. The following matrices are computed and stored on the device's memory:
C(V)∈C|V|×n C(P)∈C|P|×n
D(G)∈C|G|×n D(H)∈C|H|×n (G.4)
where V contains the vertices of the triangulation used for texture mapping, P is the set of positional constraints, G is a dense set of samples lying on the polygon P which is used for energy and gradient evaluation, and H is a subset of G used to evaluate the Hessian of the energy. Once the complex matrix D(H) is constructed, it is converted to real form D∈R2|H|×2n (where the bold symbol denotes a real matrix) as follows:
where the subscripts indicates row numbers such that Di is the ith row of the complex matrix D. Another closely related matrix with the same size which we denote as {circumflex over (D)} is also stored and is obtained from D by swapping each two pair of rows. That is, the 1st row is swapped with the 2nd one, the 3rd with the 4th and so on.
Next the implementation computes in parallel a matrix with the coefficients
These will be used in runtime to compute the Lipschitz constants of the Wirtinger derivatives. Unlike the Lipschitz constants, these depend only on the cage {circumflex over (P)} and the set G, hence, can be precomputed. A simple formula for the distance di(p) from a point p to the ith segment is given in Appendix C of [Chen and Weber 2015].
The last step of the preprocessing computes the matrix Q (see Equation (F.5)) and ∇2Ep2pƒ=QTQ, which is a constant matrix since Ep2pƒ is a quadratic form, hence can be computed and stored once.
Next, the computations are described that are needed to be carried out on each Newton iteration. The evaluation of the map f, its Wirtinger derivatives, energy, and gradient are computed using a matrix-matrix multiplication, where the right term in the product is a “thin” matrix (having many rows but very small number of columns). Such operation is known to be bandwidth limited, meaning that the computation time is dominated by the memory bandwidth of the GPU device. Hence, when performing the product one should strive to minimize the amount of access to the global memory of the device. To this end, the computation are performed using complex numbers and the result is converted back to real numbers when necessary. During runtime, the Wirtinger derivatives are first evaluated at the samples of G using the product:
Next, the following is being computed in a single dedicated kernel that accepts the matrix └ƒz
The next step computes the following complex matrix-matrix product:
where (⋅)H is the conjugate transpose (hermit) operator. The multiplication in (G.7) is computed by invoking the gemm cuBLAS procedure. Based on the summation in (32), the (real-valued) expression of the gradient of the energy is given by:
The gradient of the P2P energy also needs to be computed:
Q∈R2|P|×4n was already computed in preprocessing, and the vector q∈R2|P|×1 contains the target P2P positions (Equation (F.8)).
The computation of the Hessian is the most involved part of the implementation. First, the 4×4 matrix Ki+ that corresponds to the ith sample in H is computed in a dedicated kernel as described above. The real scalars α1, α2, β1, β2, β3 (Table 1) are computed. Then, based on the energy type, the implementation checks for negative eigenvalues and decides whether to compute Ki (Equation (19)) or its modified version Ki+. In the case of Eiso, the matrix Ki+ can be computed directly using (25). Otherwise, equations (23) and (24) are used. The entries of all these 4×4 matrices are stored in a matrix K∈R16×|H| where in each column the 16 values of the corresponding matrix Ki+ are stacked.
Unlike the computation of the gradient, which is bandwidth limited, the assembly of the Hessian is a ALU-bound operation, meaning that its runtime is governed by the amount of processing units (and their clock speed) rather than the speed of the global memory. This means that one can expect greater speedup compared to a CPU implementation. Yet, the amount of arithmetic operations involved is vast. The inventive strategy to use coarser sampling (Equation (33)) greatly reduces the Hessian assembly time.
In order to efficiently implement equation (33), the assembly of the Hessian is done separately on each of its four 2n×2n blocks. How to compute the ∇φ
be the 2×2 upper-left block of the matrix Ki+. Note that K (computed earlier) contains the corresponding values for all samples in its rows. The entries of ∇φ
where R∈R2|H|×2|H| is a diagonal matrix with diagonal entries (a1, d1, a2, d2, . . . , a|H|, d|H|) and B∈R2|H|×2|H| is a diagonal matrix with diagonal entries (b1, c1, b2, c2, . . . , b|H|, c|H|). The advantage of performing the computation in such a way is the ability to use the highly-optimized cuBLAS library [cuBLAS 2017]. The products RD and B{circumflex over (D)} in (G.10) are computed using a diagonal matrix-matrix multiplication procedure dgmm followed by a matrix summation (geam) which result in a matrix M=RD+B{circumflex over (D)}. Finally, the result is obtained through a matrix-matrix product DTM using gemm. This product dominates the runtime of (G.10).
Then, the gradient and the Hessian of Ep2pƒ are added to those of Ef (with the appropriate user-defined weight λ) to form the gradient and the Hessian of the full energy Ef (Equation (20)). the 10N+2 redundant real variables that correspond to the dimension of the null space of the harmonic subspace are then eliminated as explained above. This is done by removing the corresponding rows and columns from the gradient and the Hessian. After that, the Newton step is computed. The freely available cuSolverDN library is used to solve the linear system (1) using the dense GPU-accelerated direct Cholesky factorization.
The solution of the linear system provides the search direction for the Newton iteration. However, this step is not taken immediately. To ensure convergence to a local minimum which corresponds to a locally injective map, a backtracking line search is performed. This requires multiple evaluations of the energy and the local injectivity validation at various step sizes, starting from t=1 and dividing it by 2 at each step until the resulted map is accepted. For a map to be accepted, it is verified first that the energy decreases sufficiently (Algorithm 1, Line 17) and only then the local injectivity validation (Algorithm 1, Line 18) is performed. Both steps require evaluation of the Wirtinger derivatives ƒzt and at all samples, which varies depending on t. In general, evaluating these boils down to the same matrix product used in (G.6) for the computation of the gradient. Evaluating it many times can easily become the algorithm's bottleneck. Luckily, since the derivatives are linear in the variables, this can achieved with a single matrix product. The derivatives for step t are given by:
where φk, ψk are the current state of the (complex) variables and Δφk Δψk is the Newton search direction. The term on the left was already computed in (G.6). Hence, one only needs to compute the product D[Δφk Δψk] once, and for each step t, (G.11) boils down to a scalar-matrix multiplication followed by a (thin) matrix summation. Both operations are very efficient. The energy is then evaluated in a dedicated kernel that accepts the matrix └ƒzt┘ as input. Each thread handles a row that contains two complex numbers: ƒzt, , evaluate two real scalars: |ƒzt|2, , and use it to compute the scalar E using Table 1. Finally, the kernel performs a parallel reduction summation step to obtain Ef (Equation (31)).
In order to avoid CPU-GPU communication overhead, this step is carried out on the GPU by using the dynamic parallelism feature which allows kernels to invoke other kernels, therefore allows the GPU to execute the line search autonomously.
The iterations are terminated when t∥Δφk Δψk∥ is smaller than ϵ=1e−12 (Algorithm 1, Line 9). The algorithm always converges.
In contrast, for some inputs, the SLIM and AQP methods struggle to converge to machine precision error even after a thousand iterations.
Upon termination, the map itself is evaluated. This is done as explained in connection with Equation (G.2). Since this product is also performed on the GPU, the result can be used by the GPU directly for rendering, avoiding expensive CPU-GPU transfer times.
For evaluating the performance of the inventive algorithm, the following default parameters for all experiments were used, unless stated otherwise. The number of samples is |G|=104, and |H|=0.1 |G|. The two energy terms Ef and Ep2pƒ are balanced using λ=105. When comparing to mesh-based methods, a mesh with 10, 000 vertices is used. User prescribed P2P target positions q, are visualized as cyan disks. The images of the P2P under the map f(pi) are visualized as smaller black dots. A black dot which is centered inside a cyan disk indicates P2P satisfaction. The computer runs Windows 10 with Intel Xeon (6 cores) CPU E5-2643 v4 3.40 Ghz, 32 GB, and has an NVIDIA TitanX graphics card.
Throughout the experiments, the inventors consistently observed that the convergence rate of the inventive algorithm is significantly better than that of the first-order methods (L-BFGS, SLIM, AQP) which is attributed to its second-order nature. Furthermore, the convergence rate is always better even when compared against the Newton-Eigen method which employs a full eigen-factorization [Nocodal and Wright 2006, Equation 3.40] of the Hessian at each iteration, followed by a truncation of negative eigenvalues to 1e-10. This is demonstrated in
The full potential of the inventive deformation algorithm is leveraged when it runs on graphics hardware, yet our CPU parallel Matlab version (which is approximately 5 times slower on average) is still much faster than the alternatives we compare to.
More particularly, the top three rows use the harmonic subspace, while AQP and SLIM are general mesh-based methods. The total number of iterations needed for each method as well as the total runtime is reported on the right of each row. The graphs depict the decrease in energy as a function of iteration and as a function of time, separating harmonic from mesh-based methods as they are scaled differently. The “jump” of 0.9 s in AQP's graph (first iteration) is due to the initialization. Note that due to the large weight X, the energy at the first few iterations is very large and Ep2pƒ obscures the behavior of Eisoƒ. The zoom-in graphs show how the energy decreases more clearly. Our method converges in 15 iterations. Newton's method halts after 15 iterations due to non-positive definite Hessian. Newton-Eigen performs 35 iterations and is ×25 slower than our GPU solver. AQP and SLIM require 400 and 120 iterations and are ×39 and ×58 slower respectively.
Qualitatively, the results obtained with the mesh based methods tend to concentrate the distortion near the positional constraints. Refining the mesh does not seem to alleviate this issue completely while it increases the computation times even further. A mesh resolution of 104 vertices provides a reasonable balance between quality and speed and use it to evaluate AQP and SLIM. The inventive algorithm on the other hand is rather indifferent to the mesh resolution and using a mesh with 105 vertices, leads to extremely smooth results with no degradation in speed.