The present disclosure is generally related to registering clothing. More specifically, the present disclosure includes registering clothing with accurate deformations by leveraging a shape prior learned from pre-captured clothing using diffusion models.
Virtual presentations have become increasingly focal to social presence, self-expression through visual features, and the like. How we dress is important in the perception of identity. The digitization of dynamically deforming clothes is, therefore, a core aspect to enabling genuine social interaction in virtual environments. Digitization of clothes may be implemented in a myriad of applications including photorealistic telepresence, virtual try-on and visual effects for game and movies. Photorealistic appearance of clothes may be modeled using computer vision and graphics. However, there is a need for providing more accurate registration of clothing with large deformations (e.g., wrinkles).
The subject disclosure provides for systems and methods for cloth registration. Specifically, enabling accurate registrations of textureless clothes with large deformation by leveraging a shape prior learned from pre-captured clothing using diffusion models. The registration process is stably guided via a multi-stage guidance sampling process.
According to certain aspects of the present disclosure, embodiments includes a means for obtaining an input scan including at least cloth, a means for generating a mesh of the cloth in the scan based on a shape prior, and registering means for cloth registration comprising: guiding deformation of the cloth based on a coarse registration signal based on the mesh, and guiding the deformation of the cloth based on a distance between points in the mesh and a template mesh.
In one aspect of the present disclosure, the method includes obtaining an input scan including at least cloth, generating a mesh representing the cloth in the scan based on a shape prior, and registering a model of the cloth from the scan, the registering comprising: guiding deformation of the cloth based on a coarse registration signal based on the mesh, and guiding the deformation of the cloth based on a distance between points in the mesh and a template mesh.
Another aspect of the present disclosure relates to a system configured for cloth registration. The system includes one or more processors, and a memory storing instructions which, when executed by the one or more processors, cause the system to obtain an input scan including at least cloth, generate a mesh representing the cloth in the scan based on a shape prior, guide a first deformation of the cloth based on a coarse registration signal based on the mesh, guide a second deformation of the cloth based on a distance between points in the mesh and a template mesh, and register a model of the cloth from the scan based on the first deformation and the second deformation, wherein the first deformation corresponds to large deformations and the second deformation corresponds to detailed deformations.
Yet another aspect of the present disclosure relates to a non-transient computer-readable storage medium having instructions embodied thereon, the instructions being executable by one or more processors to perform a method(s) for cloth registration described herein. The method may include obtaining an input scan including at least clothing, generating a mesh representing the clothing in the scan based on a shape prior, guiding a first deformation of the clothing based on a coarse registration signal and 3D point cloud corresponding to the mesh, guiding a second deformation of the clothing based on a distance between points in the 3D point cloud and a template mesh, and registering a model of the clothing from the scan based on the first deformation and the second deformation, wherein the first deformation corresponds to large deformations and the second deformation corresponds to detailed deformations.
These and other embodiments will be evident from the present disclosure.
In the figures, elements having the same or similar reference numerals are associated with the same or similar attributes, unless explicitly stated otherwise.
In the following detailed description, numerous specific details are set forth to provide a full understanding of the present disclosure. It will be apparent, however, to one ordinarily skilled in the art, that the embodiments of the present disclosure may be practiced without some of these specific details. In other instances, well-known structures and techniques have not been shown in detail so as not to obscure the disclosure.
Virtual presentations have become increasingly focal to social presence, self-expression through visual features, and the like. How we dress is important in the perception of identity. The digitization of dynamically deforming clothes is, therefore, a core aspect to enabling genuine social interaction in virtual environments. Digitization of clothes may be implemented in a myriad of applications including photorealistic telepresence, virtual try-on and visual effects for game and movies. Photorealistic appearance of clothes may be modeled using computer vision and graphics. However, non-rigid 3D registration is a long-standing problem in the field of computer vision and graphics, resulting in the need for more accurate registration of cloth (e.g., clothing) with large deformations.
Surface registration may be used in modeling photorealistic appearance and geometric deformations by establishing correspondences between a template model and observed three-dimensional (3D) reconstruction at each time frame. However, traditional methods of surface registration suffer from in-plane sliding of the vertices due to the lack of geometric constraints, making the registration results unsuitable for learning clothes characteristics such as physical parameters or statistical models of deformations. To avoid sliding (e.g., in-plane sliding), these methods rely on texture, i.e., photometric consistency to establish correspondence between the template and the observed images. Due to the reliance on the texture, the performance of the registration is highly dependent on the uniqueness and contrast of the texture, making it unsuitable for regions without salient patterns. Textureless registration (e.g., regularizing deformations) do not transfer well to clothing because cloth deformation is highly complex, including stretching and bending, and additionally require hyper parameter tuning to achieve a balance between registration objective and regularization. Further, modeling large deformation and fine details of clothing simultaneously can be difficult.
Embodiments, as disclosed herein, provide a solution to the above-mentioned problems rooted in computer technology, namely, effectively leveraging a shape prior from real-world clothing deformations for cloth registration. To achieve this, embodiments as disclosed herein present a diffusion-based shape prior that can effectively encode highly complex clothing geometry. The cloth registration approach leverages the diffusion-based shape prior to achieve accurate cloth registration even in a textureless setting. The disclosed subject technology further provides improvements to the functioning of the computer itself because it achieves accurate registration of clothing under large motion even without texture information. Accordingly, improving the technological field as well provides photorealistic cloth registration with increased accuracy including, for example, wrinkle-accurate cloth registration.
According to embodiments, a diffusion model is employed to learn complex shape distributions of cloth from real pre-captured clothing. Diffusion models are a class of generative models that can learn the prior from highly complex data distributions by score matching. In embodiments, the diffusion model is used to estimate both deformation and detailed deformation in a unified model by integrating it into an optimized framework. Using real-world clothing to train the diffusion model provides an improved understanding of intricate deformations and interactions with human body parts, which are not precisely synthesized by physics-based simulation. In addition, learning the shape prior from real-world clothing deformations can constrain the solution space within the span of plausible clothing deformations, and also avoids heuristics and the need for parameter tuning.
According to embodiments, a multi-stage posterior sampling process based on learned functional maps is employed to stabilize registrations for large scale deformation, even when the deformations vary significantly from training data.
The multi-stage posterior sampling process may include, in the early stages, denoising which is guided by a learning-based coarse registration approach and, in the later stages includes refining with point-to-plane errors. In this way, the registration can avoid local minima while retaining high-fidelity cloth appearance (e.g., wrinkles) with faithful surface deformations.
According to embodiments, a ground-truth correspondence may be obtained by a tracking method based on clothes with a special printed pattern. The ground-truth correspondence may be used to evaluate the accuracy of surface registrations in real data based on the diffusion-based shape prior of embodiments. In some embodiments, garment-specific shape priors are directly learned from high-quality ground-truth (e.g., ground-truth 4D scans) to perform more accurate registration. Some embodiments also illustrate an evaluation of the ground-truth from a wide range of motions and contact of real clothes, quantitatively demonstrating the accuracy of several registration methods in real-world scenarios.
Network 150 can include, for example, any one or more of a local area network (LAN), a wide area network (WAN), the Internet, and the like. Further, network 150 can include, but is not limited to, any one or more of the following network topologies, including a bus network, a star network, a ring network, a mesh network, a star-bus network, tree or hierarchical network, and the like.
Client device 110 may also include a processor 212-1, configured to execute instructions stored in a memory 220-1, and to cause client device 110 to perform at least some operations in methods consistent with the present disclosure. Memory 220-1 may further include an application 222 and a GUI 225, configured to run in client device 110 and couple with input device 214 and output device 216. The application 222 may be downloaded by the user from server 130 and may be hosted by server 130. The application 222 includes specific instructions which, when executed by processor 212-1, cause operations to be performed according to methods described herein. In some embodiments, the application 222 runs on an operating system (OS) installed in client device 110. In some embodiments, application 222 may run out of a web browser. In some embodiments, the processor is configured to control a graphical user interface (GUI) (e.g., via GUI 225) for the user of one of client devices 110 accessing the server of the social platform, immersive reality application, or the like.
A database 252 may store data and files associated with the social platform from the application 222. In some embodiments, client device 110 is a mobile phone used to collect a video or picture and upload to server 130 using a video or image collection application 222, to store in the database 252.
Server 130 includes a memory 220-2, a processor 212-2, and communications module 218-2. Hereinafter, processors 212-1 and 212-2, and memories 220-1 and 220-2, will be collectively referred to, respectively, as “processors 212” and “memories 220.” Processors 212 are configured to execute instructions stored in memories 220. In some embodiments, memory 220-2 includes an engine 232. The engine 232 may share or provide features and resources to GUI 225, including multiple tools associated with image or video collection, capture, or design applications that use images or pictures (e.g., application 222) retrieved with engine 232. The user may access engine 232 through application 222, installed in a memory 220-1 of client device 110. Accordingly, application 222, including GUI 225, may be installed by server 130 and perform scripts and other routines provided by server 130 through any one of multiple tools. Execution of application 222 may be controlled by processor 212-1.
Engine 232 may include one or more modules (e.g., modules later described with reference to system 300 in
The shape prior tool 242 may be configured to learn, for example, given ground-truth 4D scans of cloth in motion, a shape prior using the diffusion model to simultaneously encode large deformation and fine details. The registration tool 244 can use the learned shape prior to register the same clothing to noisy 4D scans via a multi-stage manifold guidance process. In some embodiments, in a first stage of the multi-stage manifold guidance, the shape prior relies on a coarse registration signal to achieve rough alignment. By non-limiting example, the coarse registration signal may be acquired by markers, visual-based tracking, geometric-based tracking, or any combination of them. In some implementations, the coarse registration signal may be acquired based on geometric information. The alignment tool 246, in a second stage of manifold guidance, uses the shape prior to further refine the alignment of the cloth registration to achieve wrinkle-accurate registration by considering spatial proximity based on the 4D scans.
The engine 232 may include a neural network tool which may be part of one or more machine learning models stored in the database 252. The database 252 includes training archives and other data files that may be used by engine 232 in the training of a machine learning model, according to the input of the user through application 222. Moreover, in some embodiments, at least one or more training archives or machine learning models may be stored in either one of memories 220. The neural network tool may include algorithms trained for the specific purposes of the engines and tools included therein. The algorithms may include machine learning or artificial intelligence algorithms making use of any linear or non-linear algorithm, such as a neural network algorithm, or multivariate regression algorithm. In some embodiments, the machine learning model may include a neural network (NN), a convolutional neural network (CNN), a generative adversarial neural network (GAN), a deep reinforcement learning (DRL) algorithm, a deep recurrent neural network (DRNN), a classic machine learning algorithm such as random forest, k-nearest neighbor (KNN) algorithm, k-means clustering algorithms, or any combination thereof. More generally, the machine learning model may include any machine learning model involving a training step and an optimization step. In some embodiments, the database 252 may include a training archive to modify coefficients according to a desired outcome of the machine learning model. Accordingly, in some embodiments, engine 232 is configured to access database 252 to retrieve documents and archives as inputs for the machine learning model. In some embodiments, engine 232, the tools contained therein, and at least part of database 252 may be hosted in a different server that is accessible by server 130 or client device 110.
According to embodiments, cloth is registered based on ground-truth 4D scans 302-1, 302-2, and 302-3 (hereafter simply referenced as “scans 302”) of cloth in motion. The scans 302 are used to train the diffusion module 304a. Embodiments are not limited to this and the training dataset may include, for example, an image or video dataset comprising clothing and/or pattern-based cloth registration datasets that provide a template geometry for each clothing type, as well as accurate registrations in the same topology.
The cloth geometry in the scans 302 may be represented as a 3D triangle mesh (with V vertices ∈V×3 and F triangles), where the i-th vertex position is denoted as vi. According to embodiments, the scans 302 are mapped to mesh 2D ultraviolet (UV) surfaces 306-1, 306-2, and 306-3 (hereinafter, collectively referred to as “surfaces 306”). For each scan, there exists a surjective map from a 3D vertex index to a 2D UV surface (i.e., u=ϕ(i)) where every 3D vertex index maps to at least one coordinate of the 2D UV surface. The surfaces 306 are used, by diffusion module 304a, to learn a diffusion-based shape prior. In some embodiments, the surfaces 306 are represented as 3D point clouds.
According to embodiments, a displacement of the surfaces 306 may be defined from a mean template shape with vertices as a function of a UV coordinate (i.e., |u=vi−
The coarse registration module 308 generates a coarse registration signal based on the surfaces 306. The coarse registration signal may correspond to a registered point cloud. Registering cloth using the coarse registration signal further improves the accuracy and regularizes the deformations. The coarse registration module 308 is trained based on the mean template shape and random sub-sampling on mesh vertices of the surfaces 306. The coarse registration module 308 learns to predict per-vertex 3D flow from the mean template shape to a target shape given the 3D point clouds (of surfaces 306) as input. In some embodiments, where markers or visual information are not available, the coarse registration module 308 is trained to establish a putative correspondence between each 3D point cloud pair estimate a set of functions Φk for each 3D point cloud. The coarse registration module 308 jointly models the correspondences between each 3D point cloud pair and the set of functions Φk to obtain a functional map . During test time with multiple inputs, the coarse registration module 308 estimates a map set for all pairs and registers cloth based on the learned functional maps.
The diffusion module 304a is configured to train a diffusion-based model to learn a shape deformation space based on the surfaces 306. The diffusion-based model can then generate plausible shape deformations. According to some embodiments, learning the shape deformation space includes learning a prior distribution of deformation such that a random sampling in the distribution leads to accurate deformation predictions. The diffusion-based model may include forward and reverse diffusion processes.
In the forward process 410, noise is gradually added to the UV displacement map (x0) to acquire an isotropic Gaussian distribution (xT). As shown in
where ϵ˜(0,1) is a sample from the Gaussian distribution and βt is the variance schedule. To ensure coarse-to-fine shape learning, the variance schedule βt is gradually increased by increasing t. According to Equation (1), increasing t reduces the impact of xt-1 while increasing the impact of Gaussian noise, ensuring the coarse-to-fine shape learning where large deformation (low-frequency) is modeled at large t (later diffusion stage), and small deformation (high-frequency) is modeled at small t.
In the reverse process 420, the UV displacement map x0 is recovered in order to sample from the learned prior distribution of deformation by gradually denoising a corrupted UV displacement map. As shown in
The denoising, in accordance with Equation (2), is an ancestral sampling operation wherefrom a plausible data sample of the UV displacement map x0 is generated by iterating through xt-1 from the learned prior (e.g., shape prior 310). Based on the variance schedule βt, the following are defined as:
The learnable neural network ϵθ parameterized by θ aims to predict the noise ϵ from corrupted data. In some embodiments, the diffusion module 304a trains the neural network ϵ0 with a weighted variational bound as the objective defined as:
According to embodiments, seam stitching is performed to avoid clothing being separated apart at the seam. In some embodiments, the clothing may be stitched at the seam at every time step in the reverse process 420 (i.e., x′t-1=Φ(Ψ(xt-1))). The mapping from UV to mesh space is not injective. As such, ambiguities may be solved by averaging the 3D locations of UV positions which refer to the same point/UV coordinate.
Returning to
According to some embodiments, the coarse registration guidance 330 leverages the coarse registration signal to identify an alignment of cloth registration. As shown in
As described with reference to
where d is a multi-stage distance measurement in Euclidean space, and ρ is the step size of guidance. The posterior mean {circumflex over (x)}0 can be estimated from xt, where:
Guidance parameters of the coarse registration guidance 330 and the spatial proximity guidance 340, based on the distance measurement d with the decreasing time step t in the reverse diffusion process, can be defined by:
where {tilde over (v)}i is the target position predicted by the coarse registration module 308 (i.e., ()). The mesh vertices are mapped from the UV displacement map as =Φ({circumflex over (x)}0) with the i-th vertex denoted as {circumflex over (v)}i. The time step i is the point where the distance measurement is changed. retrieves the closest point in and ny
According to some embodiments, when t>τ, the coarse registration guidance 330 uses the coarse registration signal generated by coarse registration module 308 to guide large deformations of the clothing shape. The coarse registration signal is used (at coarse registration guidance 330) to guide the large deformations until a rough alignment with the input 3D point cloud is achieved. After t≤τ, the vertices are guided at the spatial proximity guidance 340 by point-to-plane errors based on spatial proximity from the point yi. Using a point-to-plane distance helps to avoid overestimating the distance measurement d when the input point cloud contains holes.
In some embodiments, after reaching t=0 in the reverse process 420, the final denoising step is repeated with point-to-plain guidance to adjust the inferred vertices to the high-frequency surface details of the point cloud.
The computing platform(s) 702 can maintain or store data, such as in the electronic storage 726, including correlation and contextual data used by the computing platform(s) 702. The computing platform(s) 702 may be configured to communicate with one or more remote platforms 704 according to a client/server architecture, a peer-to-peer architecture, and/or other architectures. The remote platform(s) 704 may be configured to communicate with other remote platforms via computing platform(s) 702 and/or according to a client/server architecture, a peer-to-peer architecture, and/or other architectures. The remote platform(s) 704 can be configured to cause output of the system 700 on client device(s) of the remote platform(s) 704 with enabled access.
The computing platform(s) 702 may be configured by machine-readable instructions 706. The machine-readable instructions 706 may be executed by the computing platform(s) to implement one or more instruction modules. The instruction modules may include computer program modules. The instruction modules being implemented may include one or more of obtaining module 708, shape representation module 710, coarse guidance module 712, proximity guidance module 714, registration module 716, training module 718 and/or other instruction modules.
The obtaining module 708 may be configured to obtain input 4D scans. The scans may be of a cloth in motion captured by at least one camera or scanning device (e.g., client device 110). According to some embodiments, the cloth may be textureless.
The shape representation module 710 may be configured to generate a mesh representing the cloth in the scan based on a shape prior using UV parameterization. The shape prior is trained to encode highly complex clothing geometry, as described herein. According to embodiments, the shape prior is learned from pre-captured clothing using a diffusion model. The mesh is represented as a 3D point cloud. In some embodiments, a displacement map is generated mapping 3D vertex indices of the mesh to a 3D UV surface based on a mean shape, each vertex index of the mesh to one or more points in a 3D surface. A point-to-plane distance measurement from the mesh (e.g., points of the 3D point cloud) to the mean shape may be determined based on the 3D surface of the input scan.
The coarse guidance module 712 may be configured to, based on the mesh, guide deformations of a model of the cloth based on a coarse registration signal until an alignment between the model and the mesh reaches a threshold.
In some embodiments, the system 700 may include, as input, a time step corresponding to the point where the distance between points in the mesh and a template (mean shape) changes. Given a current time step, a distance (e.g., point-to-point/point-to-plane) is measured and displacements are updated to minimize error.
In some embodiments, the system 700 may include a coarse signal generation module configured to generate the coarse registration signal by predicting per-vertex 3D flow from the template shape to a target shape (e.g., the cloth) given a 3D point cloud as input (e.g., the mesh).
The proximity guidance module 714 may be configured to guide deformations of the model by point-to-plane errors based on spatial proximity from points on the displacement map. For example, given a current time step, a distance measurement change is determined, and the vertices of the mesh are guided by point-to-plane errors.
The registration module 716 may be configured to register the model of the cloth from the scan based on the guided deformations providing non-rigid 3D, deformation (e.g., wrinkle) accurate cloth registration.
The training module 718 may be configured to train the diffusion model generating the shape prior based on 4D scans of clothing from a dataset. The training may include generating a mesh representation of the 4D scans. The system 700 may be further configured to gradually add random noise to the mesh to generate a corrupted UV map, wherein gradually adding the noise trains the model to learn a transition probability based on the mesh and the random noise. A variance schedule may be increased as the random noise is added. The system 700 may be further configured to reconstructing the mesh from the random noise. The reconstructing may include denoising based on the variance schedule. Denoising may include predicting the random noise from the corrupted UV map using a neural network trained with a weighted variational bound. The reconstructed mesh may be used to generate shape deformations for cloth registration.
In some implementations, the computing platform(s) 702, the remote platform(s) 704, and/or the external resources 724 may be operatively linked via one or more electronic communication links. For example, such electronic communication links may be established, at least in part, via, e.g., the network 150 such as the Internet and/or other networks. It will be appreciated that this is not intended to be limiting, and that the scope of this disclosure includes implementations in which the computing platform(s) 702, the remote platform(s) 704, and/or the external resources 724 may be operatively linked via some other communication media.
A given remote platform 704 may include client computing devices, such as the client device 110, which may each include one or more processors configured to execute computer program modules (e.g., the instruction modules). The computer program modules may be configured to enable an expert or user associated with the given remote platform 704 to interface with the system 700 and/or external resources 724, and/or provide other functionality attributed herein to remote platform(s) 704. By way of non-limiting example, a given remote platform 704 and/or a given computing platform 702 may include one or more of a server, a desktop computer, a laptop computer, a handheld computer, a tablet computing platform, a NetBook, a Smartphone, a gaming console, and/or other computing platforms. The external resources 724 may include sources of information outside of the system 700, external entities participating with the system 700, and/or other resources. For example, the external resources 724 may include externally designed XR elements and/or XR applications designed by third parties. In some implementations, some or all of the functionality attributed herein to the external resources 724 may be provided by resources included in system 700.
Computing platform(s) 702 may include electronic storage 726, one or more processors 730, and/or other components. Computing platform(s) 702 may include communication lines, or ports to enable the exchange of information with a network and/or other computing platforms. Illustration of the computing platform(s) 702 in
Electronic storage 726 may comprise non-transitory storage media that electronically stores information. The electronic storage media of electronic storage 726 may include one or both of system storage that is provided integrally (i.e., substantially non-removable) with computing platform(s) 702 and/or removable storage that is removably connectable to computing platform(s) 702 via, for example, a port (e.g., a USB port, a firewire port, etc.) or a drive (e.g., a disk drive, etc.). Electronic storage 726 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Electronic storage 726 may include one or more virtual storage resources (e.g., cloud storage, a virtual private network, and/or other virtual storage resources). Electronic storage 726 may store software algorithms, information determined by processor(s) 730, information received from computing platform(s) 702, information received from remote platform(s) 704, and/or other information that enables computing platform(s) 702 to function as described herein.
Processor(s) 730 may be configured to provide information processing capabilities in computing platform(s) 702. As such, processor(s) 730 may include one or more of a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Although processor(s) 730 is shown in
It should be appreciated that although modules 708, 710, 712, 714, 716 and/or 718 are illustrated in
The techniques described herein may be implemented as method(s)/process(es) that are performed by physical computing device(s); as one or more non-transitory computer-readable storage media storing instructions which, when executed by computing device(s), cause performance of the method(s); or as physical computing device(s) that are specially configured with a combination of hardware and software that causes performance of the method(s).
At step 802, the process 800 may include obtaining an input scan including at least a cloth. At step 804, the process 800 may include generating a mesh representing the cloth in the scan based on a shape prior. According to an aspect of embodiments, the shape prior is a diffusion-based shape prior learned from pre-captured 4D data. According to an aspect of embodiments, the mesh is represented as a 3D point cloud and each vertex index of the 3D point cloud is mapped to one or more points in a 3D surface, wherein the distance from the mesh to the template mesh corresponding to the mean shape is based on the 3D surface.
At step 806, the process 800 may include guiding deformation of the cloth based on a coarse registration signal based on the mesh. According to an aspect of embodiments, the deformation of the cloth is guided based on the coarse registration signal until an alignment of a model registration of the cloth with the mesh reaches a threshold. According to an aspect of embodiments, the coarse registration signal is generated by predicting per-vertex 3D flow from a mean template shape to the cloth based on the mesh.
At step 808, the process 800 may include guiding the deformation of the cloth based on a distance between points in the mesh and a template mesh refining the model based on point-to-plane errors. The guidance in steps 806/808 may be posterior sampling processes, as described herein.
At step 810, the process 800 may include registering the model of the cloth from the scan based on the guidance (at step 806 and step 808). As such, the model registration of the cloth is based on a multistage guidance scheme and accounts for both large deformation and detailed deformation by using the shape prior to further refine the alignment, achieving wrinkle-accurate registration by considering spatial proximity with the input scan.
Although
Referring to
In registrations 900/1000, the middle-left (row 916/1016) is the input point cloud, the bottom-left is the ground-truth, and the top-left is a zoom-in view of ground-truth. The rest are the results of different methods, where the top row shows side-by-side comparison to ground-truth, the middle row shows the geometry with normal rendering, while the bottom row shows vertex error Ev (0 mm>50 mm) in a heat map.
Diffusion model registrations 914/1014 are generated with the diffusion-based model, described in model architecture 300, according to embodiments. As shown in
Table 1 below is a comparison (where bold indicates the best results and bold italic indicates the second best results) of the performance of registrations 900 and registrations 1000 using different metrics (measured in mm). In Table 1, for each data sequence, the frames are split into training set and testing set, which further includes interpolation and extrapolation sets. The ground-truth registrations 902/1002 data is used as ground-truth for both training and testing. The interpolation testing set is uniformly sampled from the entire sequence, so its data distribution is similar to the training set. The extrapolation testing set is a manually selected short sequence consisting of body poses unseen in the training set.
5.04
15.87
21.12
10.74
3.58
9.91
15.84
21.55
Table 1 quantitatively demonstrates that the diffusion-based model consistently outperforms SyNoRiM and its heuristic refinement on the metric of vertex error Ev and reasonably outperforms in bidirectional point-to-plane errors Ept and Eps. The error metrics in Table 1 may be defined as:
where Ept and Eps indicate the surface-level alignment between the predicted shape and the ground-truth shape. A low E indicates accurate. While having a low point-to-plane error Ept and Eps is necessary, low point-to-plane error Ept and Eps alone does not provide sufficient conditions for an accurate registration. Table 1 shows that a shape prior generated according to embodiments is more effective than baseline data-driven shape priors. Although the PCA model achieves comparable Ev results to the diffusion model of embodiments on the skirt data, it shows significantly worse plane error Ept and Eps. As a linear model, PCA may not be suitable for this inherently nonlinear problem, so it is not flexible enough to achieve accurate surface-level alignment. It cannot fit to large deformations that are far from the mean shape (as shown in
Table 2 below is a comparison of the guidance breakpoint i (in Equation (6)) and the impact of varying guidance breakpoint i on the t-shirt sequence.
Table 2 shows that a large guidance breakpoint τ significantly impairs the performance, indicating that the first stage of the guidance framework (i.e., coarse registration guidance 330) plays a key role and is necessary for this instance. When i is in a reasonable range, it does not significantly affect the performance, although r that is too small amplifies the influence of error from the coarse registration. Embodiments outperform both optimization-based and learning-based non-rigid registration methods for both interpolation and extrapolation tests (as shown in Tables 1 and 2).
Registrations 1210 are based on the data prediction network and registrations 1220 are based on the noise prediction network. The data prediction network has difficulty modeling high-frequency signals like wrinkles. It cannot enforce continuity across the seams even if seam stitching is applied. Therefore, data prediction cannot represent fine details like wrinkles. In contrast, the noise prediction network generates the registrations 1220 including fine details and accurate wrinkle deformations, proving to be more effective.
In real-world scenarios, clothing usually comes with textures that make visual keypoint tracking possible. Systems and methods of embodiments can take advantage of such texture information when it is available.
Sparse ground-truth guidance may be provided by perfectly accurate sparse texture tracking in scans. In some embodiments, a sparse ground-truth guidance module may replace the coarse registration. The sparse ground-truth guidance module may randomly select Nk vertices from the ground-truth mesh (e.g., surfaces 306) and use the Nk vertices to compute the distance (defined in Equation (6)) in the first stage of the guidance framework (with t≥τ), as such replacing the coarse registration module 308. By eliminating the need of the coarse registration, any potential error introduced by the module is also removed.
Table 3 below is a comparison of the performance of sparse ground-truth guidance. From Table 3, the data demonstrates that embodiments disclosed here perform well with very sparse keypoint tracking signals and the accuracy improves when the number of sparse keypoints increases.
In Table 3, PCA performs similarly to the original setting in Table 1, while the performance of compositional VAE significantly decreases. Similarly, as shown in
Computer system 1400 (e.g., client 110 and server 130) includes a bus 1408 or other communication mechanism for communicating information, and a processor 1402 (e.g., processors 212) coupled with bus 1408 for processing information. By way of example, the computer system 1400 may be implemented with one or more processors 1402. Processor 1402 may be a general-purpose microprocessor, a microcontroller, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a state machine, gated logic, discrete hardware components, or any other suitable entity that can perform calculations or other manipulations of information.
Computer system 1400 can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them stored in an included memory 1404 (e.g., memories 220), such as a Random Access Memory (RAM), a Flash Memory, a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable PROM (EPROM), registers, a hard disk, a removable disk, a CD-ROM, a DVD, or any other suitable storage device, coupled to bus 1408 for storing information and instructions to be executed by processor 1402. The processor 1402 and the memory 1404 can be supplemented by, or incorporated in, special purpose logic circuitry.
The instructions may be stored in the memory 1404 and implemented in one or more computer program products, e.g., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, the computer system 1400, and according to any method well-known to those of skill in the art, including, but not limited to, computer languages such as data-oriented languages (e.g., SQL, dBase), system languages (e.g., C, Objective-C, C++, Assembly), architectural languages (e.g., Java, .NET), and application languages (e.g., PHP, Ruby, Perl, Python). Instructions may also be implemented in computer languages such as array languages, aspect-oriented languages, assembly languages, authoring languages, command line interface languages, compiled languages, concurrent languages, curly-bracket languages, dataflow languages, data-structured languages, declarative languages, esoteric languages, extension languages, fourth-generation languages, functional languages, interactive mode languages, interpreted languages, iterative languages, list-based languages, little languages, logic-based languages, machine languages, macro languages, metaprogramming languages, multiparadigm languages, numerical analysis, non-English-based languages, object-oriented class-based languages, object-oriented prototype-based languages, off-side rule languages, procedural languages, reflective languages, rule-based languages, scripting languages, stack-based languages, synchronous languages, syntax handling languages, visual languages, wirth languages, and xml-based languages. Memory 1404 may also be used for storing temporary variable or other intermediate information during execution of instructions to be executed by processor 1402.
A computer program as discussed herein does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subprograms, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
Computer system 1400 further includes a data storage device 1406 such as a magnetic disk or optical disk, coupled to bus 1408 for storing information and instructions. Computer system 1400 may be coupled via input/output module 1410 to various devices. Input/output module 1410 can be any input/output module. Exemplary input/output modules 1410 include data ports such as USB ports. The input/output module 1410 is configured to connect to a communications module 1412. Exemplary communications modules 1412 (e.g., communications modules 218) include networking interface cards, such as Ethernet cards and modems. In certain aspects, input/output module 1410 is configured to connect to a plurality of devices, such as an input device 1414 (e.g., input device 214) and/or an output device 1416 (e.g., output device 216). Exemplary input devices 1414 include a keyboard and a pointing device, e.g., a mouse or a trackball, by which a user can provide input to the computer system 1400. Other kinds of input devices 1414 can be used to provide for interaction with a user as well, such as a tactile input device, visual input device, audio input device, or brain-computer interface device. For example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, tactile, or brain wave input. Exemplary output devices 1416 include display devices, such as an LCD (liquid crystal display) monitor, for displaying information to the user.
According to one aspect of the present disclosure, the client device 110 and server 130 can be implemented using a computer system 1400 in response to processor 1402 executing one or more sequences of one or more instructions contained in memory 1404. Such instructions may be read into memory 1404 from another machine-readable medium, such as data storage device 1406. Execution of the sequences of instructions contained in main memory 1404 causes processor 1402 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in memory 1404. In alternative aspects, hard-wired circuitry may be used in place of or in combination with software instructions to implement various aspects of the present disclosure. Thus, aspects of the present disclosure are not limited to any specific combination of hardware circuitry and software.
Various aspects of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. The communication network (e.g., network 150) can include, for example, any one or more of a LAN, a WAN, the Internet, and the like. Further, the communication network can include, but is not limited to, for example, any one or more of the following tool topologies, including a bus network, a star network, a ring network, a mesh network, a star-bus network, tree or hierarchical network, or the like. The communications modules can be, for example, modems or Ethernet cards.
Computer system 1400 can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. Computer system 1400 can be, for example, and without limitation, a desktop computer, laptop computer, or tablet computer. Computer system 1400 can also be embedded in another device, for example, and without limitation, a mobile telephone, a PDA, a mobile audio player, a Global Positioning System (GPS) receiver, a video game console, and/or a television set top box.
The term “machine-readable storage medium” or “computer-readable medium” as used herein refers to any medium or media that participates in providing instructions to processor 1402 for execution. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as data storage device 1406. Volatile media include dynamic memory, such as memory 1404. Transmission media include coaxial cables, copper wire, and fiber optics, including the wires forming bus 1408. Common forms of machine-readable media include, for example, floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH EPROM, any other memory chip or cartridge, or any other medium from which a computer can read. The machine-readable storage medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter affecting a machine-readable propagated signal, or a combination of one or more of them.
To illustrate the interchangeability of hardware and software, items such as the various illustrative blocks, modules, components, methods, operations, instructions, and algorithms have been described generally in terms of their functionality. Whether such functionality is implemented as hardware, software, or a combination of hardware and software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application.
As used herein, the phrase “at least one of” preceding a series of items, with the terms “and” or “or” to separate any of the items, modifies the list as a whole, rather than each member of the list (i.e., each item). The phrase “at least one of” does not require selection of at least one item; rather, the phrase allows a meaning that includes at least one of any one of the items, and/or at least one of any combination of the items, and/or at least one of each of the items. By way of example, the phrases “at least one of A, B, and C” or “at least one of A, B, or C” each refer to only A, only B, or only C; any combination of A, B, and C; and/or at least one of each of A, B, and C.
To the extent that the term “include,” “have,” or the like is used in the description or the claims, such term is intended to be inclusive in a manner similar to the term “comprise” as “comprise” is interpreted when employed as a transitional word in a claim. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
A reference to an element in the singular is not intended to mean “one and only one” unless specifically stated, but rather “one or more.” All structural and functional equivalents to the elements of the various configurations described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and intended to be encompassed by the subject technology. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the above description. No clause element is to be construed under the provisions of 35 U.S.C. § 112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or, in the case of a method clause, the element is recited using the phrase “step for.”
While this specification contains many specifics, these should not be construed as limitations on the scope of what may be claimed, but rather as descriptions of particular implementations of the subject matter. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
The subject matter of this specification has been described in terms of particular aspects, but other aspects can be implemented and are within the scope of the following claims. For example, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. The actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the aspects described above should not be understood as requiring such separation in all aspects, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products. Other variations are within the scope of the following claims.
It should be understood that the original applicant herein determines which technologies to use and/or productize based on their usefulness and relevance in a constantly evolving field, and what is best for it and its players and users. Accordingly, it may be the case that the systems and methods described herein have not yet been and/or will not later be used and/or productized by the original applicant. It should also be understood that implementation and use, if any, by the original applicant, of the systems and methods described herein are performed in accordance with its privacy policies. These policies are intended to respect and prioritize player privacy, and to meet or exceed government and legal requirements of respective jurisdictions. To the extent that such an implementation or use of these systems and methods enables or requires processing of user personal information, such processing is performed (i) as outlined in the privacy policies; (ii) pursuant to a valid legal mechanism, including but not limited to providing adequate notice or where required, obtaining the consent of the respective user; and (iii) in accordance with the player or user's privacy settings or preferences. It should also be understood that the original applicant intends that the systems and methods described herein, if implemented or used by other entities, be in compliance with privacy policies and practices that are consistent with its objective to respect players and user privacy.
The present disclosure is related and claims priority under 35 U.S.C. § 119(e) to U.S. Prov. Application No. 63/454,000, entitled DIFFUSION SHAPE PRIOR FOR WRINKLE-ACCURATE CLOTH REGISTRATION to SAITO, et-al., filed on Mar. 22, 2023, the contents of which are hereby incorporated by reference in their entirety, for all purposes.
Number | Date | Country | |
---|---|---|---|
63454000 | Mar 2023 | US |