FREESTYLE ACQUISITION METHOD FOR HIGH-DIMENSIONAL MATERIAL

Description

TECHNICAL FIELD

The present application relates to a freestyle acquisition method for high-dimensional material, belonging to the fields of computer graphics and computer vision.

BACKGROUND

Digitization of real-world objects is one of the core issues in computer graphics and vision. At present, a digitized real object can be expressed by a three-dimensional grid model and a six-dimensional Spatially Varying Bidirectional Reflectance Distribution Function (SVBRDF). The digitized real object can realistically reproduce its original appearance in any view angle and lighting condition, and have important applications in cultural heritage, e-commerce, computer games, and film production.

Although high-precision geometric models can be easily obtained by commercial mobile 3D scanners, it is also desirable to develop a lightweight device for freestyle appearance scanning for the following reasons. First, as long as the pose of the camera can be estimated reliably, it can scan objects of different sizes. Second, objects that are not allowed to be transported, such as precious cultural relics, can be scanned at the scene due to the mobility of the device. In addition, it takes short time and low cost to manufacture the lightweight device, which makes it acceptable to a wider range of users. It also provides a user-friendly experience similar to geometric scanning.

Despite the surge in demand, effective non-planar appearance scanning is still a problem to be solved. On the one hand, most of the existing mobile appearance scanning work includes capturing under the condition of single point/parallel light, which leads to the low sample efficiency in the four-dimensional lighting-view domain, and prior knowledge is required to trade spatial resolution for angular precision (Giljoo Nam, Joo Ho Lee, Diego Gutierrez, and Min H Kim. 2018. Practical SVBRDF acquisition of 3D objects with unstructured flash photography. In SIGGRAPH Asia Technical Papers. 267.). On the other hand, the fixed-view acquisition system depends on the fixed view conditions when the illumination changes. At present, it is not clear how to extend the fixed-view acquisition system to mobile devices, because the mobile devices have unstructured and changing views, and cannot completely cover the lighting field due to the small sizes thereof.

SUMMARY

In view of the shortcomings of the prior art, the present application provides a freestyle acquisition method for high-dimensional materials. The method can effectively utilize the acquisition condition information of each view to reconstruct high-quality object material attributes from disordered and unevenly distributed acquisition results.

The present application discloses a freestyle acquisition method for high-dimensional materials. The main idea of this method is that freestyle appearance scanning can be transformed into a geometric learning problem on unstructured point clouds, and each point in the point clouds can represent an image measurement value and the pose information of an object during image capturing. Based on this idea, the present application designs a neural network, which can effectively aggregate information in different unstructured views, reconstruct spatially independent reflection attributes, optimize the lighting patterns used in the acquisition stage, and finally obtain high-quality material reconstruction results. The present application does not depend on a specific acquisition device, a fixed object can be acquired by a person holding the device, or the object can be placed on a turntable to rotate while the device is fixed, and it is not limited to these two ways.

In this method, the learning of material information is transformed into a geometric learning problem on unstructured point clouds, and a plurality of sampling results in different lighting and view directions are combined into a high-dimensional point cloud; each point in the point cloud is a vector composed of the measured image value and the pose information of the object when the image is captured. According to this method, information of unstructured views can be effectively aggregated from the high-dimensional point cloud which is disordered, irregular, uneven in distribution and limited in precision, and the high-quality material attributes are reconstructed. The formal representation is as follows:

F(G(high dimensional point cloud))=m

where the feature extraction method G of point cloud data is not limited to a specific network structure, and other methods that can extract features from point clouds are also applicable; a nonlinear mapping network F is not limited to fully connected networks; the expression of material attributes of an object is not limited to a Lumitexel vector m.

This method includes two stages: a training stage and an acquisition stage.

The training stage includes the following steps:

(1) Calibrating parameters of an acquisition device and generating acquisition results simulating an actual camera as training data.

(2) Training a neural network by using the generated training data, in some embodiments, the neural network has the following characteristics:

(2.1) An input of the neural network is Lumitexel vectors under k unstructured samplings, where k is a number of samplings; each value of Lumitexel describes the reflected luminous intensity of a sampling point along a specific view direction when illuminated by incident light from each light source; Lumitexel has a linear relationship with a luminous intensity of the light source, which is simulated by a linear fully connected layer.

(2.2) A first layer of the neural network includes the linear fully connected layer, which is configured to simulate an lighting pattern used during an actual acquisition and transform k Lumitexels into camera acquisition results, and these k camera acquisition results are combined with pose information of the corresponding sampling point to form a high-dimensional point cloud.

(2.3) A second layer is the feature extraction network, and the feature extraction network is configured to independently extract features from each point in the high-dimensional point cloud to obtain feature vectors.

(2.4) After the feature extraction network, a max pooling layer is provided and configured to aggregate the feature vectors extracted from k unstructured views to obtain a global feature vector.

(2.5) After the max pooling layer, a nonlinear mapping network is provided and configured to reconstruct high-dimensional material information according to the global feature vector.

The acquisition stage includes the following steps:

(1) Material acquisition: constantly irradiating according to the lighting pattern, by the acquisition device, , a group of photos of the target three-dimensional object obtaining under unstructured views, by the camera, and the photos are taken as the input to obtain a geometric model of the sampled object with texture coordinates and poses of the camera when taking the photos.

(2) Material reconstruction: according to the poses of the camera when taking the photos in the acquisition stage, obtaining a pose of a vertex corresponding to each effective texture coordinate on the sampled object when taking each photo; forming the high-dimensional point cloud as an input of the feature extraction network, as the second layer of the neural network, according to the acquired photos and pose information, and obtaining the high-dimensional material information through calculation.

Further, the unstructured sampling is a freestyle random sampling with a non-fixed view direction, sampling data is disordered, irregular and uneven in distribution, a fixed object can be used, and acquisition can be carried out by a person holding the acquisition device, or an object is placed on a turntable for rotation and the acquisition device is fixed for acquisition.

Further, in the process of generating the training data, when the light source is colored, it is necessary to correct a spectral response relationship among the light source, the sampled object and the camera, and a correction method is as follows:

A spectral distribution curve of an unknown color light source L is defined as S_c₁^L(λ), where λ represents a wavelength, c₁represents one of RGB channels, and a spectral distribution curve L(λ) of the light source with a luminous intensity of {I_R,I_G,I_B} can be expressed as:

L(λ)=I_RS_R^L(λ)+I_GS_G^L(λ)+I_BS_B^L(λ)

A reflection spectrum distribution curve p(λ) of any sampling point p is expressed as a linear combination of three unknown bases S_c₂^p(λ) with coefficients of p_R, p_G, p_Brespectively, and c₂represents one of the RGB three channels:

p(λ)=p_RS_R^p(λ)+p_GS_G^p(λ)+p_BS_B^p(λ)

The spectral distribution curve of a camera C is expressed as a linear combination of S_c₃^C(λ); under the illumination of a light source with a luminous intensity of {I_R,I_G,I_B}, a measured value of the camera for a sampling point with a reflection coefficient of {p_R,p_G,p_B} in a specific channel c₃is as follows:

∫L(λ)p(λ)S_c₃^C(λ)dλ=Σ_c₁_,c₂I_c₁p_c₂δ(c₁,c₂,c₃)

δ(c₁,c₂,c₃)=∫S_c₁^L(λ)S_c₂^p(λ)S_c₃^C(λ)dλ

Under an illumination condition of {I_R,I_G,I_B}={1,0,0}/{0,1,0}/{0,0,1}, a color-checker with a known reflection coefficient of {p_R,p_G,p_B} is photographed, and linear equations are established according to the measured value acquired by the camera, and a color correction matrix δ(c₁,c₂,c₃) with a size of 3×3×3 is solved to represent the spectral response relationship among the light source, the sampled object and the camera.

Further, in step (2.1) of the training stage, a relationship among an observed value B of a sampling point p on the surface of the object, a reflection function f_rand the luminous intensity of each light source can be described as follows:

$B (I, P) = \sum_{l} I (l) \int \frac{1}{{ x_{l} - x_{p} }^{2}} Ψ (x_{l'} - ω_{i}^{'}) V (x_{l}, x_{p}) f_{r} (ω_{i}^{'}; ω_{o}^{'}, P) {(ω_{i}^{'} \cdot n_{p})}^{+} {(- ω_{i}^{'} \cdot n_{l})}^{+} {dx}_{l}$

where I represents the luminous information of each light source l, including: a spatial position x_lof the light source l, a normal vector n_lof the light source l, the luminous intensity I(l) of the light source l, and P includes the parameter information of a sampling point p, including: a spatial position x_pof the sampling point and material parameters n, t, α_x, α_y, ρ_d, and ρ_s; Ψ(x_l,⋅) describes a luminous intensity distribution of the light source l in different incident directions, V represents a binary function for visibility of x_lfor x_p, (⋅)⁺ is a dot product operation of two vectors; f_r(ω_i′;ω_o′, P) is a two-dimensional reflection function about ω_i′ when ω_o′ is fixed.

The Lumitexel vector is denoted as m(l;P).

m(l;P)=B({I(l)=1,∀_i≠lI(i)=0},P)

In the above equation, B is a representation under a single channel illumination; when the light source is a color light source, B is expanded to the following form:

$B (I, P; c_{3}) = \sum_{l, c_{1}, c_{3}} I (l; c_{1}) \int \frac{1}{{ x_{l} - x_{p} }^{2}} Ψ (x_{l'} - ω_{i}^{'}) V (x_{l}, x_{p}) f_{r} (ω_{i}^{'}; ω_{o}^{'}, P, c_{2}) {(ω_{i}^{'} \cdot n_{p})}^{+} {(- ω_{i}^{'} \cdot n_{l})}^{+} δ (c_{1}, c_{2}, c_{3}) {dx}_{l}$

where f_r(ω_i′;ω_o′,P,c₂) represents a result of

$ρ_{d} = ρ_{d_{c_{2}}}, ρ_{s} = ρ_{s_{c_{2}}}$

in f_r(ω_i′;ω_o′,P).

Further, in the step (2.3) of the training stage, an equation of the feature extraction network is as follows:

V
_feature(j)=f(concat[B(I,P^j),x_p^j,{circumflex over (n)}_p^j,{circumflex over (t)}_p^j]), 1≤j≤k

where f represents a one-dimensional convolution function, with a convolution kernel size of 1×1, B(I,P^j) represents an output result of the first layer network or the acquired measured value, and x_p^j, {circumflex over (n)}_p^j, {circumflex over (t)}_p^jrepresent a spatial position of the sampling point, a geometric normal vector of the sampling point and a geometric tangent vector in a j^thsampling respectively, {circumflex over (n)}_pis obtained by a geometric model, {circumflex over (t)}_prepresents an arbitrary unit vector orthogonal to {circumflex over (n)}_p, and {circumflex over (n)}_p, {circumflex over (t)}_pcan be transformed by the pose of the camera at the time of the j^thsampling to obtain {circumflex over (n)}_p^j,{circumflex over (t)}_p^j, and V_feature(j) represents a feature vector the network output at the time of the j^thsampling.

Further, in the step (2.5) of the training stage, a nonlinear mapping network is formally expressed as follows:

y
_i+1
^d
=f
_i+1
^d(y_i^dW_i+1^d+b_i+1^d), i≥1

y
_i+1
^s
=f
_i+1
^s(y_i^sW_i+1^s+b_i+1^s), i≥1

where f_i+1represents a mapping function of a (i+1)^thlayer network, W_i+1represents a parameter matrix of the (i+1)^thlayer network, b_i+1represents an offset vector of the (i+1)^thlayer network, y_i+1represents an output of the (i+1)^thlayer network, d and s represent two branches of diffuse reflection and specular reflection respectively, and the inputs y₁^dand y₁^srepresent global feature vectors output by the max pooling layer.

Further, a loss function of the neural network is designed as follows:

(1) A Lumitextel space is virtualized, which is a cube with a center at a spatial position x_pof the sampling point, an x-axis direction of a center coordinate system of the cube is {circumflex over (t)}_pand a z-axis direction is {circumflex over (n)}_p, {circumflex over (n)}_prepresents a geometric normal vector, and {circumflex over (t)}_pis an arbitrary unit vector orthogonal to {circumflex over (n)}_p.

(2) A camera is virtualized, and the view direction represents a positive direction of the z axis of the cube.

(3) For diffuse Lumitexel, a resolution of the cube is 6×N_d², and for specular Lumitexel, a resolution of the cube is 6×N_s², that is, N_d²and N_s²points are evenly sampled from each face as virtual point light sources with a luminous intensity of a unit luminous intensity.

a) A specular albedo ρ_sof the sampling point is set to be 0, and a diffuse reflection feature vector {tilde over (m)}_din this Lumitexel space is generated.

b) A diffuse albedo ρ_dis set to be 0, and a specular reflection feature vector {tilde over (m)}_sin the Lumitexel space is generated.

c) The output of the neural network are vectors m_d, m_s, where m_dand {tilde over (m)}_dhave a same length and m_sand {tilde over (m)}_shave a same length, and the vectors m_d, m_sare the predictions of the diffuse reflection feature vector {tilde over (m)}_dand the specular reflection vector {tilde over (m)}_s.

(4) A loss function of a material feature part is expressed as follows:

L=λ
_dΣ_l[m_d(l)−{tilde over (m)}_d(l)]²+λ_sβΣ_l[m_s(l)−log(1+{tilde over (m)}_s(l))]²

where λ_dand λ_srepresent the loss weights of m_dand m_s, respectively, and a confidence coefficient β is used to measure the loss of the specular Lumitexel, and log acts on each dimension of the vector.

The confidence coefficient β is determined as follows:

$β = \min (\frac{1}{ϵ} \max_{j} [\frac{\max_{l} \log (1 + f_{r} (ω_{i}^{j^{'}} (l); ω_{o}^{j^{'}}, P))}{\max_{ω_{i}^{'}} \log (1 + f_{r} (ω_{i}^{'}; ω_{o}^{j^{'}}, P))}], 1), 1 \leq j \leq k$

where term max_ilog(1+f_r(ω_i^j′(l);ω_o^j′,P)) represents a logarithm of a maximum value of rendering values of all single-light sources sampled in the j^thsampling, and term max_ω_i′ log(1+f_r(ω_i′;ω_o^j′,P)) represents a logarithm of a maximum value of rendering values of single-light source theoretically available in the j^thsampling, and ϵ represents the ratio adjustment factor.

Further, in the acquisition stage, geometric alignment is carried out after completing the material acquisition, and then material reconstruction is carried out after the geometric alignment; the geometric alignment includes: a scanned geometric model is obtained by scanning the object with a scanner, the scanned geometric model is aligned with a three-dimensional reconstructed geometric model, and the three-dimensional reconstructed geometric model is replaced with the scanned three-dimensional geometric model.

Further, for the effective texture coordinates, pixels in the photos are taken out sequentially according to the acquired photos and the pose information of the sampling points, the validity of the pixels is checked, and the high-dimensional point cloud is formed by combining the pixel value and the corresponding point pose; for a point p on the surface of the sampled object determined by the effective texture coordinates, a criterion that the j^thsampling is valid for the vertex p is expressed as follows:

(1) A position x_p^jof the vertex p is visible to the camera in this sampling, and x_p^jis located in a sampling space defined when training the network.

(2) (ω_o′⋅{circumflex over (n)}_p^j)>θ, (⋅) represents a dot product operation, θ represents a lower bound of a valid sampling direction, ω_o′ represents a direction of incident light in a world coordinate system, and {circumflex over (n)}_p^jrepresents a normal vector of the vertex p of the j^thsampling.

(3) A numerical value of each channel of the pixels in the photos is in an interval [a, b], where a and b represent lower and upper bounds of a valid sampling brightness.

When all three conditions are satisfied, it is considered that the j^thsampling is valid for the vertex p, and a result of the j^thsampling is added to the high-dimensional point cloud.

Further, the material parameters can be fitted after reconstructing the material information, a process of fitting is divided into two steps:

(1) Fitting a local coordinate system and a roughness: for a point p on the surface of the sampled object determined by the valid texture coordinates, the local coordinate system and the roughness in the material parameters are fitted according to a single-channel specular reflection vector output by the network by an L-BFGS-B method.

(2) Fitting albedos: the specular albedo and the diffuse albedo are solved by using a trust region algorithm, the local coordinate system and the roughness obtained in a previous process are fixed during solution, and an observed value is synthesized in the view direction used in the acquisition stage, so as to make the synthesized observed value as close as possible to an observed value obtained.

The method provided by the present application has the beneficial effects that the learning of material information is transformed into a geometric learning problem on unstructured point cloud, and a plurality of sampling results in different lighting and view directions are combined into a high-dimensional point cloud, and each point in the point cloud is a vector composed of an image measurement value and the pose information of an object during image capturing. According to this method, information of unstructured views can be effectively aggregated from the high-dimensional point cloud which is disordered, irregular, and uneven in distribution and limited in precision, and the material attributes with high quality are recovered.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a three-dimensional schematic diagram of an acquisition device according to an embodiment of the present application;

FIG. 2 is a front view of an acquisition device according to an embodiment of the present application;

FIG. 3 is a side view of an acquisition device according to an embodiment of the present application;

FIG. 4 is a schematic diagram of the relationship between an acquisition device and a sampling space according to an embodiment of the present application;

FIG. 5 is a flowchart of an acquisition method according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a neural network according to an embodiment of the present application;

FIG. 7 is a single channel display of the lighting pattern obtained according to an embodiment of the present application, in which the gray value is used to represent the luminous intensity;

FIG. 8 is a Lumitexel vector result reconstructed by using the system according to an embodiment of the present application; and

FIG. 9 is the result of the material attribute of a sampled object reconstructed by using the system according to an embodiment of the present application.

DESCRIPTION OF EMBODIMENTS

In order to make the object, technical solution and advantages of the present application more clear, the present application will be described in detail with the accompanying drawings.

The present application provides a freestyle acquisition method for high-dimensional materials, which can be specifically implemented by the following steps:

I. Training Stage:

1. Training data are generated and the parameters of an acquisition device are calibrated, including the distance and direction from a light source to the origin of a sampling space, the characteristic curve of the light source, the distance and direction from a camera to the origin of the sampling space, and the intrinsic and extrinsic parameters of the camera. By using these parameters, the acquisition results for simulating actual cameras are generated as training data. The rendering model used when generating training data is a GGX model, and the generation formula satisfies:

$f_{r} (ω_{i}^{'}; ω_{o}^{j^{'}}, P) = \frac{ρ_{d}}{π} + ρ_{s} \frac{D_{GGX} (ω_{h}; α_{x}, α_{y}) F (ω_{i}, ω_{h}) G_{GGX} (ω_{i}, ω_{o}; α_{x}, α_{y})}{4 (ω_{i}^{'} \cdot n) (ω_{o}^{'} \cdot n)}$

where f_r(ω_i′,ω_o′;P) is a four-dimensional reflection function about ω_i′,ω_o′, ω_i′ represents the direction of incident light in a world coordinate system, ω_o′ represents the direction of outgoing light in the world coordinate system, ω_iis the incident direction in a local coordinate system, ω_ois the outgoing direction in the local coordinate system and ω_his a half-way vector in the local coordinate system. P contains the parameter information of a sampling point, including the material parameters n, t, α_x, α_y, ρ_d, ρ_sof the sampling point, where n represents the normal vector in the world coordinate system, t represents the x-axis direction of the local coordinate system of the sampling point in the world coordinate system, and n and t are used for transforming the incident direction and the outgoing direction from the world coordinate system to the local coordinate system. α_xand α_yrepresent roughness coefficients, ρ_drepresents a diffuse albedo, ρ_srepresents a specular albedo, and ρ_dand ρ_sare one scalar in a single channel, and are three scalars (ρ_d_R,ρ_d_G,ρ_d_B) and (ρ_s_R,ρ_s_G,ρ_s_B) in a case of colors, D_GGXis a micro-surface distribution term, and F is a Fresnel term, and G_GGXrepresents a shadow coefficient function.

When the light source used in the acquisition is color, the spectral response relationship among the light source, the sampled object and the camera should be acquired first. The correction method is as follows: a spectral distribution curve of an unknown color light source L is defined as S_c₁^L(λ), where λ represents a wavelength, c₁represents one of RGB channels, and a spectral distribution curve L(λ) of the light source with a luminous intensity of {I_R,I_G,I_B} can be expressed as:

L(λ)=I_RS_R^L(λ)+I_GS_G^L(λ)+I_BS_B^L(λ)

In some embodiments, a reflection spectrum distribution curve p(λ) of any sampling point p is expressed as a linear combination of three unknown bases S_c₂^p(λ) with coefficients of p_R, p_G, p_Brespectively, and c₂represents one of the RGB three channels:

p(λ)=p_RS_R^p(λ)+p_GS_G^p(λ)+p_BS_B^p(λ)

In some embodiments, the spectral distribution curve of a camera C is expressed as a linear combination of S_c₃^C(λ); under the illumination of a light source with a luminous intensity of {I_R,I_G,I_B}, a measured value of the camera for a sampling point with a reflection coefficient of {p_R,p_G,p_B} in a specific channel c₃satisfies:

∫L(λ)p(λ)S_c₃^C(λ)dλ=Σ_c₁_,c₂I_c₁p_c₂δ(c₁,c₂, c₃)

δ(c₁,c₂,c₃)=∫S_c₁^L(λ)S_c₂^p(λS_c₂^C(λ)dλ

Under an illumination condition of {I_R,I_G,I_B}={1,0,0}/{0,1,0}/{0,0,1}, a color-checker with a known reflection coefficient of {p_R,p_G,p_R} is photographed, and linear equations are established according to the measured value acquired by the camera, and a color correction matrix δ(c_l,c₂,c₃) with a size of 3×3×3 may be solved to represent the spectral response relationship among the light source, the sampled object and the camera.

2. The generated training data are used to train the neural network shown in FIG. 4. The characteristics of the neural network are as follows:

(1) A relationship among an observed value B of a sampling point p on the surface of the object, a reflection function f_rand the luminous intensity of each light source can be described as follows:

where I represents the luminous information of each light source l, including: a spatial position x_lof the light source l, a normal vector n_lof the light source l, the luminous intensity I(l) of the light source l, and P includes the parameter information of a sampling point p, including: a spatial position x_pof the sampling point and material parameters n, t, α_x, α_y, ρ_d, ρ_s; Ψ(x_l,⋅) describes a luminous intensity distribution of the light source l in different incident directions, V represents a binary function for visibility of x_lfor x_p, (⋅)⁺ is a dot product operation of two vectors, and negative values are truncated to 0; f_r(ω_i′;ω_o′,P) is a two-dimensional reflection function about ω_i′ when ω_o′ is fixed.

The input of the neural network is Lumitexels under k disordered and irregular samplings, where k is a number of samplings, Lumitexel is a vector, which is recorded as m(l;P); each value of Lumitexel describes the reflected luminous intensity of a sampling point along a specific view direction when illuminated by incident light from each light source.

m(l;P)=B({I(l)=1,∀_i≠lI(i)=0},P)

In the above formula, B is a representation under a single channel; when the light source is a color light source, B is expanded to the following form:

$B (I, P; c_{3}) = \sum_{l, c_{1}, c_{2}} I (l; c_{1}) \int \frac{1}{{ x_{l} - x_{p} }^{2}} Ψ (x_{l'} - ω_{i}^{'}) V (x_{l}, x_{p}) f_{r} (ω_{i}^{'}; ω_{o}^{'}, P, c_{2}) {(ω_{i}^{'} \cdot n_{p})}^{+} {(- ω_{i}^{'} \cdot n_{l})}^{+} δ (c_{1}, c_{2}, c_{3}) {dx}_{l}$

where f_r(ω_i′;ω_o′,P,c₂) is a result of

$ρ_{d} = ρ_{d_{c_{2}}}, ρ_{s} = ρ_{s_{c_{2}}}$

in f_r(ω_i′;ω_o′,P) formula; B has a linear relationship with the luminous intensity of the light source and can be simulated by a linear fully connected layer.

(2) The first layer of the neural network includes a linear fully connected layer, and the parameter matrix of the linear fully connected layer is trained by the following formula:

W
_l
=f
_W(W_raw)

where W_rawis a parameter to be trained; W_lis an illumination matrix, and for single-channel light source, the size is 1×N, and for color light source, the size is 3×N, N is the vector length of Lumitexel; f_Wis a mapping and is used to transform W_rawso that the generated illumination matrix can correspond to the possible luminous intensity of the light source; in this example, the mapping f_Wuses a Sigmoid function, which limits the initialization value of the illumination matrix W_lof the first layer network to (0,1), but f_Wis not limited to the Sigmoid function.

By taking W_las the luminous intensity of a light source, k sampling observed values B(I,P¹),B(I,P²) . . . B(I,P^k) are calculated according to the above-mentioned relation (1).

(3) A feature extraction network starts from a second layer, and features are independently extracted in k samplings to obtain feature vectors, and the formula is as follows:

V
_feature(j)=f(concat[B(I,P^j),x_p^j,{circumflex over (n)}_p^j,{circumflex over (t)}_p^j]), 1≤j≤k

where f is a one-dimensional convolution function, with a convolution kernel size of 1×1, B(I,P^j) represents an output result of the first layer network or the acquired measured value, and x_p^j, {circumflex over (n)}_p^j, {circumflex over (t)}_p^jare a spatial position of the sampling point, a geometric normal vector of the sampling point and a geometric tangent vector in a j^thsampling respectively, {circumflex over (n)}_pis obtained by a geometric model, {circumflex over (t)}_pis an arbitrary unit vector orthogonal to {circumflex over (n)}_p, and {circumflex over (n)}_p, {circumflex over (t)}_pcan be transformed by the pose of the camera at the time of the j^thsampling to obtain {circumflex over (n)}_p^j,{circumflex over (t)}_p^j, and V_feature(j) is a feature vector of the network output at the time of the j^thsampling.

(4) The feature extraction network is followed by a max pooling layer. The maxi pooling operation formula is as follows:

V
_feature=max(V_feature(1),V_feature(2), . . . , V_feature(k))

where the max pooling operation is carried out in each dimension of V_feature(1),V_feature(2), . . . , V_feature(k).

(5) The max pooling layer is followed by a nonlinear mapping network:

y
_i+1
^d
=f
_i+1
^d(y_i^dW_i+1^d+b_i+1^d), i≥1

y
_i+1
^s
=f
_i+1
^s(y_i^sW_i+1^s+b_i+1^s), i≥1

where f_i+1is a mapping function of a (i+1)^thlayer network, W_i+1is a parameter matrix of the (i+1)^thlayer network, b_i+1is an offset vector of the (i+1)^thlayer network, y_i+1is an output of the (i+1)^thlayer network, and d and s represent two branches of diffuse reflection and specular reflection respectively, and the inputs y₁^dand y₁^sare V_feature.

(6) The loss function of the neural network is as follows:

(6.1) A Lumitextel space is virtualized, which is a cube with a center at the spatial position x_pof the sampling point, and an x-axis direction of a center coordinate system of the cube is {circumflex over (t)}_pand a z-axis direction is {circumflex over (n)}_p, {circumflex over (n)}_pis a geometric normal vector, and {circumflex over (t)}_pis an arbitrary unit vector orthogonal to {circumflex over (n)}_p.

(6.2) A camera is virtualized, and a view direction is a positive direction of the z axis of the cube.

(6.3) For diffuse Lumitexel, a resolution of the cube is 6×N_d², and for specular Lumitexel, the resolution of the cube is 6×N_s², that is, N_d², N_s²points are evenly sampled from each face as virtual point light sources with a luminous intensity of a unit luminous intensity; in this example, N_d=8, N_s=32.

(a) A specular albedo ρ_sof the sampling point is set to 0, and a diffuse reflection feature vector {tilde over (m)}_din this Lumitexel space is generated.

(b) A diffuse albedo ρ_dis set to 0, and a specular reflection feature vector {tilde over (m)}_sin the Lumitexel space is generated.

(c) The output of the neural network are vectors m_d, m_s, where m_dand {tilde over (m)}_dhave a same length and m_sand {tilde over (m)}_shave a same length, and the vectors m_d, m_sare the predictions of the diffuse reflection feature vector {tilde over (m)}_dand the specular reflection vector {tilde over (m)}_s.

(4) A loss function of a material feature part is expressed as follows:

L=λ
_dΣ_l[m_d(l)−{tilde over (m)}_d(l)]²+λ₂βΣ_l[m_s(l)−log(1+{tilde over (m)}_s(l))]²

where λ_dand λ_srespectively represent the loss weights of m_d, m_s, and a confidence coefficient β is used to measure the loss of the specular Lumitexel, and log acts on each dimension of the vector.

The confidence coefficient β is determined as follows:

$β = \min (\frac{1}{ϵ} \max_{j} [\frac{\max_{l} \log (1 + f_{r} (ω_{i}^{j^{'}} (l); ω_{o}^{j^{'}}, P))}{\max_{ω_{i}^{'}} \log (1 + f_{r} (ω_{i}^{'}; ω_{o}^{j^{'}}, P))}], 1) .1 \leq j \leq k$

Where the term max_llog(1+f_r(ω_i^j′(l);ω_o^j′,P)) represents a logarithm of a maximum value of rendering values of all single-light sources sampled in the j^thsampling, and the term max_ω_i′ log(1+f_r(ω_i′;ω_o^j′,P)) represents a logarithm of a maximum value of rendering values of single-light sources theoretically available in the j^thsampling, and ϵ is the ratio adjustment factor; in this example, ϵ=50%.

3. After the training, the parameter W_rawof the linear fully connected layer of the network is taken out and transformed by the formula W_l=f_W(W_raw) as the lighting pattern.

II. Acquisition stage: the acquisition stage can be subdivided into a material acquisition stage, a geometric alignment stage (optional) and a material reconstruction stage.

1. Material Acquisition Stage

The acquisition device illuminates the target three-dimensional object constantly according to the lighting pattern, and the camera obtains a group of photos under unstructured views. Taking the photos as input, a geometric model of the sampled object and the pose of the camera when taking photos can be obtained by using the three-dimensional reconstruction tools open in the industry.

2. Geometric Alignment Stage (Optional)

(1) An object is scanned with a high-precision scanner to obtain a geometric model.

(2) The geometric model scanned by the scanner and the 3D reconstructed geometric model are aligned to replace the 3D reconstructed geometric model; the alignment method can be CPD (A. Myronenko and X. Song. 2010. Point Set Registration: Coherent Point Drift. IEEE PAMI 32, 12 (2010), 2262-2275. https://doi.org/10.1109/TPAMI.2010.46).

3. Material Reconstruction Stage:

(1) According to the pose of the camera when the photo is taken in the material acquisition step, the pose x_p^j,{circumflex over (n)}_p^j,{circumflex over (t)}_p^jof each vertex on the sampled object when the j^thphoto is taken is obtained.

(2) Iso-charts, a tool known in the field is used to obtain a geometric model with texture coordinates for the geometric model of the sampled object obtained by three-dimensional reconstruction or the geometric model of the sampled object scanned by an aligned scanner.

(3) For the effective texture coordinates, pixels in the photos are taken out in turn according to the acquired group of photos r₁,r₂, . . . , r_π and the pose information of the sampling points, the validity of the pixels is checked, and the high-dimensional point cloud is formed by combining the pixel value and corresponding vertex pose x_p,{circumflex over (n)}_p,{circumflex over (t)}_p; as the input vector of the second layer the neural network, i.e., the feature extraction network, the output vectors m_dand m_sof the last layer are calculated.

For a point p on the surface of the sampled object determined by the effective texture coordinates, the criterion that the j^thsampling is valid for the vertex p is expressed as follows:

1) A position x_p^jof the vertex p is visible to the camera in this sampling, and x_p^jis located in a sampling space defined when training the network.

2) (ω_o′⋅{circumflex over (n)}_p^j)>θ, (⋅) is a dot product operation, θ is a lower bound of an valid sampling direction, ω_o′ represents a direction of outgoing light in a world coordinate system, and {circumflex over (n)}_p^jrepresents a normal vector of the vertex p of the j^thsampling.

3) A numerical value of each channel of the pixels in the photos is in an interval [a, b], where a and b are lower and upper bounds of a valid sampling brightness; in this example, a=32 and b=224.

When all three conditions are satisfied, it is considered that the j^thsampling is valid for the vertex p, and a result of the j^thsampling is added to the high-dimensional point cloud.

(4) Fitting material parameters, which is divided into two steps:

1) Fitting Local Coordinate System and Roughness

For a point p on the surface of the sampled object determined by the effective texture coordinates, the local coordinate system and the roughness in the material parameters are fitted according to a single-channel mirror reflection vector output by the network by an L-BFGS-B method; the optimization objective is as follows:

minimize(Σ_l∥m_s(l)−f_r(ω_i′;ω_o′,P)∥₂)

where i is the serial number of the virtual light source in (6.3), m_s(l) indicates the value of the l^thdimension of the specular reflection feature vector predicted by the network, ω_i′ indicates the incident direction formed by the virtual light source with the serial number l and the sampling point, ω_o′ indicates the outgoing direction formed from the sampling point to the virtual camera in (6.2), and P contains the normal vector n′, tangent vector t′ and other material parameter p′ of the sampling point, which vary with the selected model. For example, a GGX model is used in this project, and p′ includes anisotropic roughness, specular albedo and diffuse albedo. n′,t′,p′ in the above optimization objectives are optimizable parameters.

2) Fitting Albedos

In this process, a trust region algorithm is used to solve the specular albedo and diffuse albedo, and the fitting objective satisfies:

minimize(Σ_j∥B_j−{tilde over (B)}_j∥₂)

where {tilde over (B)}_jrepresents the observed value of the camera after the pixel is irradiated by the optimized lighting pattern at the j^thview direction for fitting. B_jrepresents the observed value synthesized by n′, t′ and roughness obtained in the previous process at the j^thview direction for fitting. For the color light source, the synthesis parameters also include a corrected color correction matrix δ(c₁,c₂,c₃). The calculation process of B_jis as follows:

First, the diffuse albedo is set to 1 and the specular albedo is set to 0, and the coordinate system and roughness for rendering obtained in the previous step are used to render the diffuse Lumitexel in the first view. Then set the diffuse albedo to 0 and the specular albedo to 1, and use the coordinate system and roughness obtained in the previous step to render the specular Lumitexel, {dot over (m)}_j^sin the j^thview direction. Two Lumitexels are connected to form a matrix M_j={{dot over (m)}_j^d,{dot over (m)}_j^s} with a size of N×2, where N is the number of light sources of the sampling device. The lighting pattern matrix W_lwith a size of 3×N used in sampling is multiplied by M_jto obtain W_lM_jwith a size of 3×2. It is further multiplied by an optimizable variable ρ_d,sof a size of 2×3 to obtain the following equation:

T=W_lM_jρ_d,s

where the size of T is 3×3, and its copies are connected to form a tensor {dot over (T)}={T,T,T}, which is summed in the last two dimensions to finally get a three-dimensional vector B_j. For a color light source, {circumflex over (T)} should be multiplied by the color correction matrix δ(c₁,c₂,c₃) obtained by a calibration device element by element, and then summed in the last two dimensions.

Here is a concrete example of the acquisition device system, as shown in FIG. 1, which shows the system example in three dimensions, FIG. 2, which is a front view, and FIG. 3, which is a side view. The acquisition device consists of a light board with a camera fixed on the upper part for acquiring images. There are 512 LED beads densely arranged on the light board. The lamp bead is controlled by FPGA, and the luminous brightness and time can be adjusted.

An example of an acquisition system applying the method of the present application is given below, and the system is generally divided into the following modules:

A preparation module: providing data sets for network training. In this part, a GGX model is used, and a set of material parameters, pose information of k sampling points and camera positions are input to obtain a high-dimensional point cloud composed of k reflection situations. A network training part uses a Pytorch open source framework and uses an Adam optimizer for training. The network structure is shown in FIG. 6. Each rectangle represents a layer of neurons, and the numbers in the rectangle represent the number of neurons in this layer. The leftmost layer is the input layer and the rightmost layer is the output layer. The solid arrow between layers indicates full connection, and the dashed arrow indicates convolution.

An acquisition module: the device is shown in FIGS. 1, 2 and 3, and the specific structure has been described above. The size of the sampling space defined by this system and the spatial relationship between the acquisition device and the sampling space are shown in FIG. 4.

A reconstruction module: the geometric model of the sampled object obtained by three-dimensional reconstruction or the aligned geometric model of the sampled object scanned by the scanner is used to calculate the geometric model with texture coordinates, the trained neural network is loaded, the material feature vector is predicted for each vertex on the geometric model with texture coordinates, and the coordinate system and material parameters for rendering are fitted.

FIG. 5 is the workflow of this embodiment. Firstly, training data is generated, and 200 million sets of material parameters are obtained by random sampling, 80% of which are taken as training sets and the rest as test sets. A Xavier method is used to initialize parameters when training the network, and the learning rate is 1e−4. The lighting pattern is colorful, and the size of the lighting matrix is (3,512). The three rows of the matrix are divided into tables to represent the lighting patterns of the red, green and blue channels. After the training, the lighting matrix is taken out and transformed into the lighting pattern, and the parameters of each column specify the luminous intensity of the light source at this position. FIG. 7 shows a three-channel lighting pattern of red, green and blue obtained by network training. The next process is as follows: 1, the device is held by a hand, the light board emits light according to the lighting pattern, and the camera shoots the object at the same time to get a set of sampling results; 2, for the geometric model of the sampled object obtained by 3D reconstruction or scanned by the scanner, a geometric model with texture coordinates is obtained by using Isochart; 3, for each vertex on the geometric model with texture coordinates, the corresponding valid real shot data is found according to the pose and pixel value of the sampled photos, and a high-dimensional point cloud input is formed to reconstruct the diffuse reflection feature vector and specular reflection feature vector; 4, according to the diffuse reflection feature vector and specular reflection feature vector output by the network, an LBFGS-B method is used to fit the coordinate system and roughness for rendering for each vertex, and a trust region algorithm is used to solve the specular albedo and diffuse albedo.

FIG. 8 shows two Lumitexel vectors in the test set reconstructed by using the above system, with one column on the left being {tilde over (m)}_sand the corresponding column on the right being m_s.

FIG. 9 shows the results of material attributes reconstructed by scanning the material appearance of the sampled object with the above system. The first line respectively represents the three components (ρ_d_R,ρ_d_G,ρ_d_B) of the sampled object, the second line respectively represents the three components (ρ_s_R,ρ_s_G,ρ_s_B) of the sampled object, and the third line respectively represents the roughness coefficients α_x, α_yof the sampled object, and the gray value represents the numerical value.

The above is only a better example of implementation, and the present application is not limited to the above embodiments, so long as the technical effect of the present application is achieved by the same means, it should belong to the protection scope of the present application. Within the scope of protection of the present application, the technical solution and/or implementation thereof can have various modifications and changes.

Claims

1. A freestyle acquisition method for high-dimensional materials, comprising a training stage and an acquisition stage; wherein the training stage comprises:(1) calibrating parameters of an acquisition device, and generating acquisition results simulating an actual camera as training data; and(2) training a neural network by using the generated training data, wherein the neural network has the following characteristics:(2.1) an input of the neural network is Lumitexel vectors under k unstructured samplings, where k represents a number of samplings, each value of Lumitexel describes the reflected luminous intensity of a sampling point along a specific view direction when illuminated by incident light from each light source; Lumitexel has a linear relationship with a luminous intensity of the light source, and the linear relationship is simulated by a linear fully connected layer;(2.2) a first layer of the neural network comprises the linear fully connected layer, the linear fully connected layer is configured to simulate an lighting pattern used during an actual acquisition and transform the k Lumitexel into camera acquisition results, and the k camera acquisition results are combined with pose information of the corresponding sampling point to form a high-dimensional point cloud;(2.3) a second layer is a feature extraction network, and the feature extraction network is configured to independently extract features from each point in the high-dimensional point cloud to obtain feature vectors;(2.4) after the feature extraction network, a max pooling layer is provided and configured to aggregate the feature vectors extracted from k unstructured views to obtain a global feature vector;(2.5) after the maximum pooling layer, a nonlinear mapping network is provided and configured to reconstruct high-dimensional material information according to the global feature vector;the acquisition stage comprises:(1) material acquisition: constantly irradiating, by the acquisition device, a target three-dimensional object according to the lighting pattern, obtaining, by a camera, a group of photos under unstructured views, and the photos are taken as an input to obtain a geometric model of the sampled object with texture coordinates and poses of the camera when taking the photos;(2) material reconstruction: according to the pose of the camera when taking the photos in the acquisition stage, obtaining a pose of a vertex corresponding to each effective texture coordinate on the sampled object when taking each photo; forming the high-dimensional point cloud as an input of the feature extraction network, as the second layer of the neural network, according to the acquired photos and pose information, and obtaining the high-dimensional material information through calculation.
2. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein the unstructured sampling is a free random sampling with a non-fixed view direction, sampling data is disordered, irregular and uneven in distribution, a fixed object is capable of being used, and acquisition is capable of being carried out by a person holding the acquisition device, or an object is placed on a turntable for rotation and the acquisition device is fixed for acquisition.
3. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein in the process of generating the training data, when the light source is colored, a spectral response relationship among the light source, the sampled object and the camera needs to be corrected, and a correction method comprises: a spectral distribution curve of an unknown color light source L is defined as Sc1L(λ), where λ represents a wavelength, c1 represents one of RGB channels, and a spectral distribution curve L(λ) of the light source with a luminous intensity of {IR,IG,IB} satisfies: L(λ)=IRSRL(λ)+IGSGL(λ)+IBSBL(λ)a reflection spectrum distribution curve p(λ) of any sampling point p is expressed as a linear combination of three unknown bases Sc2p(λ) with coefficients of pR, pG, pB respectively, and c2 represents one of the RGB channels: p(λ)=pRSRp(λ)+pGSGp(λ)+pBSBp(λ)the spectral distribution curve of a camera C is expressed as a linear combination of Sc3C(λ); under an illumination of a light source with a luminous intensity of {IR,IG,IB}, a measured value of the camera for a sampling point with a reflection coefficient of {pR,pG,pB} in a specific channel c3 satisfies:
4. The freestyle acquisition method for high-dimensional materials according to claim 3, wherein in step (2.1) of the training stage, a relationship among an observed value B of a sampling point p on the surface of the object, a reflection function fr and the luminous intensity of each light source is capable of being described as follows:
5. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein in the step (2.3) of the training stage, an equation of the feature extraction network is as follows: Vfeature(j)=f(concat[B(!,Pj),xpj,{circumflex over (n)}pj,{circumflex over (t)}pj]), 1≤j≤k where f represents a one-dimensional convolution function, with a convolution kernel size of 1×1, B(I,Pj) represents an output result of the first layer network or the acquired measured value, and xpj, {circumflex over (n)}pj, {circumflex over (t)}pj represent a spatial position of the sampling point, a geometric normal vector of the sampling point and a geometric tangent vector in a jth sampling, respectively, {circumflex over (n)}p is obtained by a geometric model, {circumflex over (t)}p represents an arbitrary unit vector orthogonal to {circumflex over (n)}p, and {circumflex over (n)}p, {circumflex over (t)}p are capable of being transformed by the pose of the camera at the time of the jth sampling to obtain {circumflex over (n)}pj, {circumflex over (t)}pj, and Vfeature(j) represents a feature vector of the network output at the time of the jth sampling.
6. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein in the step (2.5) of the training stage, a nonlinear mapping network is formally expressed as follows: yi+1d=fi+1d(yidWi+1d+bi+1d), i≥1yi+1s=fi+1s(yisWi+1s+bi+1s), i≥1where fi+1 represents a mapping function of a (i+1)th layer network, Wi+1 represents a parameter matrix of the (i+1)th layer network, bi+1 represents an offset vector of the (i+1)th layer network, yi+1 represents an output of the (i+1)th layer network, d and s represent two branches of diffuse reflection and specular reflection, respectively, and the inputs y1d and y1s represent global feature vectors output by the max pooling layer.
7. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein a loss function of the neural network is designed as follows: (1) a Lumitextel space is virtualized, the Lumitextel space is a cube with a center at a spatial position xp of the sampling point, an x-axis direction of a center coordinate system of the cube is {circumflex over (t)}p, and a z-axis direction is {circumflex over (n)}p, {circumflex over (n)}p represents a geometric normal vector, and {circumflex over (t)}p is an arbitrary unit vector orthogonal to {circumflex over (n)}p;(2) a camera is virtualized, and a view direction represents a positive direction of the z axis of the cube;(3) for diffuse Lumitexel, a resolution of the cube is 6×Nd2, and for specular Lumitexel, a resolution of the cube is 6×Ns2, Nd2 and Ns2 points are evenly sampled from each face as virtual point light sources with a luminous intensity of a unit luminous intensity;a) a specular albedo ρs of the sampling point is set to be 0, and a diffuse reflection feature vector {tilde over (m)}d in this Lumitexel space is generated;b) a diffuse albedo ρd is set to be 0, and a specular reflection feature vector {tilde over (m)}s in the Lumitexel space is generated;c) the output of the neural network are vectors md, ms, wherein md and {tilde over (m)}d have a same length and ms and {tilde over (m)}s have a same length, and the vectors md, ms are the predictions of the diffuse reflection feature vector {tilde over (m)}d and the specular reflection vector {tilde over (m)}s;(4) a loss function of a material feature part is expressed as follows: L=λdΣl[md(l)−{tilde over (m)}d(l)]2+λsβΣl[ms(l)−log(1+{tilde over (m)}s(l))]2 where λd and λs represent the loss weights of md and ms, respectively, and a confidence coefficient β is used to measure the loss of the specular reflection Lumitexel, and log acts on each dimension of the vector;the confidence coefficient β is determined as follows:
8. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein in the acquisition stage, geometric alignment is carried out after completing the material acquisition, and material reconstruction is carried out after the geometric alignment; the geometric alignment comprises: a geometric model is obtained by scanning an object with a scanner, the scanned geometric model is aligned with a three-dimensional reconstructed geometric model, and the three-dimensional reconstructed geometric model is replaced with the three-dimensional scanned geometric model.
9. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein for the effective texture coordinates, pixels in the photos are taken out sequentially according to the acquired photos and the pose information of the sampling points, the validity of the pixels is checked, and the high-dimensional point cloud is formed by combining the pixel value and the corresponding vertex pose; for a point p on the surface of the sampled object determined by the effective texture coordinates, a criterion that the jth sampling is valid for the vertex p is expressed as follows: (1) a position xpj of the vertex p is visible to the camera in this sampling, and xpj is located in a sampling space defined when training the network;(2), (ωo′⋅{circumflex over (n)}pj)>θ, (⋅) represents a dot product operation, θ represents a lower bound of an valid sampling direction, ωo′ represents a direction of outgoing light in a world coordinate system, and {circumflex over (n)}pj represents a normal vector of the vertex p of the jth sampling;(3) a numerical value of each channel of the pixels in the photos is in an interval [a, b], wherein a and b represent lower and upper bounds of an valid sampling brightness;when all three conditions are satisfied, the jth sampling is valid for the vertex p, and a result of the jth sampling is added to the high-dimensional point cloud.
10. The freestyle acquisition method for high-dimensional materials according to claim 1, wherein the material parameters is capable of being fitted after reconstructing the material information, a process of fitting is divided into two steps: (1) fitting a local coordinate system and a roughness: for a point p on the surface of the sampled object determined by the effective texture coordinates, the local coordinate system and the roughness in the material parameters are fitted according to a single-channel specular reflection vector output by the network with an L-BFGS-B method; and(2) fitting albedos: the specular albedo and the diffuse albedo are solved by using a trust region algorithm, the local coordinate system and the roughness obtained in a previous process are fixed during solution, and an observed value is synthesized in the view direction used in the acquisition stage, so as to make the observed value as close as possible to an observed value obtained.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of International Application No. PCT/CN2021/098576, filed on Jun. 7, 2021, the content of which is incorporated herein by reference in their entirety.

Continuations (1)

	Number	Date	Country
Parent	PCT/CN2021/098576	Jun 2021	US
Child	18493831		US

FREESTYLE ACQUISITION METHOD FOR HIGH-DIMENSIONAL MATERIAL

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

Continuations (1)