The Invention relates to a system and a method for digitizing the human body under clothing. Machine learning techniques and optimal algorithms in applied simulation technologies are utilized for this invention.
The invention regarding reconstructing a human body under clothing, presents a new method for designing and building a digital version of the human body. Typically, traditional methods use a 3D scanning system based on technologies such as Laser Triangulation; Photogrammetry and Structured Light for 3D digitalization of the human body. These systems exploit users' image data or point cloud data obtained by depth cameras to build digitalized versions of people. An overview of the traditional method model is shown in
However, traditional methods face noticeable challenges. Firstly, the digitalized person here is required to wear tight clothes to capture his actual body shape, causing an inconvenient, time-consuming and impractical 3D body scanning process. Secondly, current methods are only able to create the 3D human shape and extract its measurements but almost incapable of simulating its movement, which is essential for practical applications. Therefore, a method for digitalizing the human body which allows a digitalized person to wear casual outfits and simulates not only his shape but also his pose and movement is necessary to better satisfy actual requirements.
Thirdly, traditional methods require time for data processing. In particular, regarding Laser Triangulation technology, point clouds obtained after scanning need to be processed by specific software to create a 3D model, which is very time-consuming. Fourthly, installing Photogrammetry and Structured light systems is timely and costly (about $100,000). Finally, 3D body scanning systems, which use special lighting to capture different sides of the body simultaneously could be hazardous to human health. Taking all above problems into account, machine learning techniques are presented to increase processing speed, reduce implementation costs, optimize space utilization and preserve the digitalized person from harmful lights. These techniques are expected to have a wide application in various fields.
The first purpose of the invention is to propose a system for digitalizing body shape of human body shape under clothing based on machine learning techniques and optimal algorithms on RGB image data. In which, machine learning techniques are used to: first, classify and segment clothing region; second, estimate skeleton joint locations and postures; third, detect human region and background region in the image and fourth, ensure the proportion of human body parts according to the human race. The optimal algorithm is used to generate three-dimensional human body data that matches the information obtained from the image.
To achieve the above purpose, proposed system and method include 2 main modules: (1) Pre-Processing Module, (2) Optimization Module, and 2 supplementary blocks: (1) Input Block, (2) Output Block. In particular, the Pre-processing Module collects image data and image information for the Optimization Module. Specifically, the Pre-processing Module includes four components as follows: (1) Image Standardization Block: standardizing input images for processing in next steps; (2) Clothes Classification and Segmentation Block: using machine learning techniques to identify, classify and locate clothes appearing in the RGB images; (3) Human Pose Estimation Block: Using machine learning methods to recognize human posture in the standardized image inputted; (4) Cloth-Skin Displacement Block: using cloth-skin displacement probability distribution in different types of clothing to estimate the distance between clothes and human skin surface.
The posture, clothing type and distance distribution information in the Preprocessing Module are input data for the Optimization Module. In which, the Optimization Module consists of 2 main components: (1) Human Parametric Model: simulating various forms and poses of human via Parameters controlling the shape (tall, short, thin, fat . . . ) and Parameters controlling the pose (standing, sitting, arms spreading . . . ), thereby morphing a parametric 3D model into a real human 3D model; (2) Human Parametric Optimization: optimizing postural and shape parameters corresponding with information received from the Preprocessing Module to transform the parametric model into a model approximate to the real human shape.
The second purpose of the invention is to propose a method for digitalizing a human body shape under clothing based on machine learning and optimization algorithms on RGB image data. To this end, the proposed method consists of four steps: (1) Step 1: Collecting dressed human image; (2) Step 2: Standardizing and extracting image information; (3) Step 3: Developing a parametric model and optimizing parameters; (4) Step 4: Displaying the digitized human body model.
As shown in
In this invention, following terms are construed as below:
“Digitized human body model” or “Digital human model” is data that uses laws of mesh points, mesh surfaces to represent a three-dimension shape of a real person's body shape. That means all shape sizes are preserved from the real body. In addition, a digital human model utilizes reference key points as well in order to present human joints, thereby controlling the posture of the digital human model. This data is saved as FBX format—a format used to exchange 3D geometry and animation data. FBX files can store various data including bones, meshes, lighting, camera, and geometry, etc. to complete animation scenes. This file format supports geometry and appearance related to properties like color and texture. It also supports skeletal animations and morphs. Both binary and ASCII files are supported.
“Human joint” is a point physically connecting bones in the body to form a complete skeletal system of a functional human body.
“Clothes classification and segmentation” is a process to classify the clothes type/label, background, skin, hair and identify its area location in the image
“Clothes type” or “type of clothing” in the proposed technique includes 11 categories: image background, skin, hair, innerwear, outerwear, skirt, dress, pants, shoes, bag, and others.
“Cloth-skin displacement probability distribution” is the statistical probability of the occurrence of a distance between the clothing surface of each clothes type and human skin surface.
“Machine learning techniques” used in the proposed method are techniques which, firstly, extract image characteristics and secondly, learn to suggest models for predicting, classifying, determining and constraining properties including: type of clothing, region of clothing, human and background in the image, location of joints and the human race.
“Optimization algorithms” refer to adjusting the pose and shape parameters to morph a human parametric model to matches the body information obtained from the image.
A “human parametric model” is a model that could simulate various forms and poses of humans via shape parameters controlling the shape (tall, short, thin, fat . . . ) and parameters controlling the pose (standing, sitting, arms spreading . . . ). It creates rules for number of mesh points, type of meshes, index of mesh surfaces and location of joint points that digitizing the human body has to comply with.
Input Block
The main function of the Input block is to collect color images taken by hardware devices such as cameras, camcorders, IP cameras, smartphones, scanners or any other devices that can capture a color image. These images are raw data for the Pre-processing module before the implementation of the digitizing human body.
Pre-Processing Module:
Referring to
First extractor block (called Clothes Classification and Segmentation) is developed by using machine learning techniques to classify the clothes type and identify its position in the image. Machine learning techniques are applied to learn how to do clothes classification and segmentation on a large dataset of image including defined clothes region and its name tag. Then, a learned model is able to predict clothes type and position in a new image reliably. In this block, 11 specific objects are classified and identified, including background, skin, hair, inner clothes, outer clothes, dress, sheath dress, bag, shoes and others.
Second extractor block (called Human Pose Estimation) uses the same method as the first block to identify joints in different body parts of the object in the standardized image, including head, neck, shoulder (left, right), elbow (left, right), wrist (left, right), spine, hip (left, right), knee (left, right), ankle (left, right), foot (left, right). Identified joint positions are used to reconstruct the human pose.
Third extractor block (called Cloth-skin Displacement Model) is built based on cloth-skin displacement probability distribution of each clothes type. The purpose of this block is to estimate the distance between clothes and skin, thereby estimating the human shape under clothing more accurately. Cloth-skin displacement model is developed by using a large dataset (pairs of people with and without clothes) as well.
Optimization Module
As illustrated in
Output Block
Main function of Output Block is to display final results in the form of a mesh model (.fbx) following standard of vertex and face number. The final result can be shown on computer screen, projector screen or other similar hardware devices.
Referring to
Step 1: Collecting Dressed-Human Images
In this step, dressed-human image is taken by hardware devices (like camera). Then, these collected images are sent to Pre-Processing Module for information extraction in Step 2
Step 2: Standardizing and Extracting Image Information
The input images are adjusted by several standards such as image size, brightness, distortion, topological uniformity. Internal and external camera parameters are determined as well.
First extractor block (called Clothes Classification and Segmentation) uses machine learning techniques to classify and segment clothes based on inputted standardized images. These machine learning algorithms are developed by training a large dataset of images including defined cloth region and its label that would automatically identify similar region and label when browsing a new input image. There are 11 labeled regions including background, skin, hair, inner clothes, outer clothes, dress, sheath dress, bag, shoes and others.
Second extractor block (called Human Pose Estimation) uses the same method as the first block to identify joints in different body parts of the object in the standardized image, including head, neck, shoulder (left, right), elbow (left, right), wrist (left, right), spine, hip (left, right), knee (left, right), ankle (left, right), foot (left, right). Joint positions acquired are used to reconstruct the human pose.
Third extractor block (called Cloth-skin Displacement Model) is built based on cloth-skin displacement probability distribution of each clothes type. The purpose of this block is to estimate the distance between clothes and skin, thereby estimating the human shape under clothing more accurately
Step 3: Parameterizing and Optimizing the Human Parametric Model
Given the joint locations, clothes classification and segmentation and probability distribution for each clothes type that have been identified in previous step, this step determines parameters of the 3D human model so that its pose and shape information satisfy the information in Pre-processing Module. The process of optimization is performed by minimizing the objective function E(β,θ) as follows:
E(β,θ)=λJEJ(β,θ,K,Jest)+λSES(β,θ)+λCEC(β,θ)
In which:
2D distance between joint locations of real human in image determined by Pre-processing Module and the projection of 3D joints of human parametric model. ΠK is perspective projection of joints in three dimensional (RM) on the image, K denotes the camera parameters.
penalty error between boundary contour of real human and the projection of the SMPL model. Where: c∈C, C is a set of cloth segmentation, C={skirt, skin, hair, . . . }; pc denotes points in boundary contour of parts in input image; NNSMPL,c(pc) denotes points in boundary contour of projected SMPL model that is nearest from pc; nc denotes the number of points in boundary contour of part c.
displacement between human skin contour and cloths skin contour. dp: 2D distance between point in human skin contour and cloth contour corresponding with cloth type c and sample point p.
The objective function is minimized by applying derivative-free optimization method.
Step 4: displaying 3D model of human body.
In this step, the final result in the form of a mesh model (.fbx) following the standard of vertex and face number can be showed on computer screen, projector screen or other similar hardware devices.
Number | Date | Country | Kind |
---|---|---|---|
1-2020-03069 | May 2020 | VN | national |