Non-Patent Citations
U.S. Patent Documents
Today e-commerce platforms are fundamentally changing how people shop. More and more customers are opting to shop online more than physically visiting stores. The market research shows, in the US alone, e-commerce retailers made $322 billion in sales revenue in 2016, and it is expected to rise to $4 trillion by 2020. Not only, shopping online saves a lot of time, but also it is a totally different user experience. The e-commerce platforms let customers to connect to millions of sellers and to browse billions of products, view ratings and reviews of sellers and products, and choose higher quality products at lower costs. It significantly cuts down the cost of maintaining stores, warehouse and task force and competitions between sellers' further enables very competitive prices and superior products. The e-commerce platforms starve to provide the users personalized experience because it makes easy to retarget or remarket to the customers. Machine Learning is extensively used by such platforms to make recommendations and advertise suggested products that are related to the products customers are looking to buy, helping them and essentially hooking them to the platform. Machine learning provides capabilities to learn models for tasks that are very hard to design algorithms for and complements in tedious algorithm tasks.
One of the fundamental disadvantages of the e-commerce platforms is customers cannot try the product before they buy it. The return rates of fashion and clothing accessories are very high, as the customers do not get what they expect. Most e-commerce platforms are starting to add Augmented Reality (AR) and Virtual Reality (VR) elements to their products to allow customers to try the products virtually, and it helps a lot to make decisions. The current innovation looks for combining photogrammetry principles and filling the gaps of algorithmic extensive tasks with machine learning models to improve the accuracy of such decisions. It enables accurate virtually trying the products on 3D characters, rather than in camera as in the AR. It further helps in fashion discovery, fashion research and taking better decisions for customers and increases customer engagement and sales for the platform.
The patent describes the innovation that tries to solve the problem online cloth buyers and allows users to try the clothes on their 3D characters or avatars for seeing if and how good the clothes will fit them and look on them. The solution provides a web and mobile interface to the users and lets them capture 360-degree video of their body. The photogrammetry principles are combined with machine learning models to construct a 3D projection of images called 3D model followed by modeling and character building to make a 3D character or an avatar. The body measures are derived from 3D constructions and photogrammetry and then further enhanced by various subsystems. The cloth measurements are derived from clothing databases or provided by the clothing merchandise websites. The interface lets users build their 3D models and then 3D characters that can be used for applications like academic, research, 3D printing, designing decorative items, using as web avatar, mobile and desktop “live desktop” and active screensavers and in gaming and online world applications.
The solution provides a RESTful service that allows cloth merchandise websites to send the cloth parameters like dimensions and texture and with user identifier through a web client. The solution identifies the user from the identifier and compares the dimensions of the user body and clothes and try to perform a virtual fitting generating the fitting information. The body measurements are compared with cloth measurements to decides if the fit is loose, tight or fit at different key body measurements. Both character and fitting information are sent to the User Output 105 web client that displays the fitting information mapped on the character on a 3D viewer.
The solution described in the innovation consists of a fast and lightweight web or mobile frontend to interact with users and cloud backend to heavy processing.
The Capturing Subsystem
The capturing subsystem 102 is responsible for capturing required information from a camera that is sufficient for generating the 3D mesh.
The 3D Construction Subsystem
The 3D construction subsystem 106 in the backend consists of sparse cloud generation 301, dense cloud generation 302, surface refinement 303, surface reconstruction 304, and mesh texturing pipeline 305 in order following the photogrammetry principles to generate the point cloud.
Due to high processing time and to avoid the users from starving while waiting, lambda architecture is used for the dense point cloud generation and for other time-consuming 3D modeling processes.
The surface or mesh reconstruction step 304 is used for estimating a mesh surface that explains the best the input point-cloud. The process might include open source components from Multi-View Environment (MVE) for the point cloud generation, dense cloud generation, and surface reconstruction, end-to-end pipeline for image-based geometry reconstruction, structure-from-motion, Multi-View Stereo, and surface reconstruction. The surface refinement step 303 is used for recovering all fine details after initial surface reconstruction. The innovation uses open-source component COLMAP, a Structure-from-Motion (SIM) and Multi-View Stereo library that can be used for the point cloud generation, dense cloud generation and surface reconstruction and surface refinement. Heterogeneous computing with CUDA is used for speeding the process. The mesh texturing 305 is used for computing a sharp and accurate texture to color the mesh to generate real-world 3D Models. The innovation uses MESHCON, MVS-texturing, and MeshLab for mesh reconstruction and texturing.
At steps 301, 302, 303, 304, and 305 the output is checked by corresponding CNN based rejecter classified model(s) 307 and the transfer to the next step is conditional to the quality of output. If the quality of output at any step in the pipeline is rejected by the classifier 307, the step is repeated again with tuned processing levels to further improving the quality and. This approach reduces the wastes of the processing power compared the to the case when the output is not good at the end of the pipeline. The intelligent processing tuning is performed with the generative adversarial networks (GAN) based tuners 306 that keep learning and try to achieve the just minimum amount of processing required to get accepted by the CNN for a given input in case of failure. If a step could not be correct after 3 tuning cycle, the input is rejected.
The Character-Building Subsystem
The character-building subsystem 108 convert generated 3D models 404 to 3D character or avatars 104 capable of doing real-world operations.
HIPS—spine—chest—shoulders—arm—forearm—hand
HIPS—spine—chest—neck—head
HIPS—Up Leg—Leg—foot—toe—toe end
The character skinning 503 is the process of attaching the mesh to the skeleton to make 3D humanoids or character. It involves binding vertices in 3D modes to bones. The innovation uses open source tools MakeHuman libraries for the complete character-building pipeline and the simulation of muscular movement and Blender libraries for 3D pipeline modeling, rigging, animation, and cloth simulation. Manuel Bastioni LAB, an open source plug-in of Blender is used for the parametric 3D modeling of photorealistic humanoid characters. It includes both consolidated algorithms as the 3D morphing and experimental technologies, as the fuzzy mathematics used to handle the relations between human parameters, the non-linear interpolation used to define the age, mass, and tone, the auto-modeling engine based on body proportions and the expert system used to recognize the bones in motion capture skeletons.
Similar to the 3D construction subsystem, the initial 3D model and at steps 501, 502, and 503, the output is checked by corresponding CNN based rejecter classified model(s) 508 and the transfer to the next step is conditional to the quality of output. If the quality of output at any step in the pipeline is not accepted by rejecter 508, and the step is repeated again with tuned processing levels. The intelligent processing tuning is performed with the generative adversarial networks (GAN) based tuners 509 that keep learning and try to achieve the just minimum amount of processing required to get accepted by the CNN for a given input in case of failure. If a step could not be correct after 3 tuning cycle, the input is rejected.
The Virtual Fitting Subsystem
The virtual fitting 111 subsystem uses algorithms and learned TensorFlow models for comparing the dimensions of clothes and the 3D models or the character and guessing the accuracy of the fit type with information like narrow fir, loose fit or good fit on various parts of the body. The information can be color mapped and shown as overlapping on the top of cloth fitting. There are various algorithms used for fitting 3D Clothing Fitting Based on the Geometric Feature Matching, Single-shot Body Shape Estimation, and surface metric. The innovation uses open source projects OpenFit and OpenKnit for virtual fitting. Cloth swapping is the process of changing clothes on the prepared 3D model 404 or the character 104. The innovation uses Valentina platform, an open source pattern drafting software, designed to be the foundation of a new stack of open source tools to remake the garment industry and Blender for Cloth simulation like Smoothing of Cloth, Cloth on Armature, Cloth with Animated Vertex Groups, Cloth with Dynamic Paint, Using Cloth for Soft Bodies and Cloth with Wind.
The User Output subsystem 105 consists of web or mobile based 3D Model Viewer or player using the WebGL and WebVR technologies. The solution has 3D viewer based on libraries like Three.js, Vizor, X3dom, Babylon, and WhitestormJS or can integrate open source 3D viewers like the VA3C viewer, A-frame, Playconvas, Potree, and pannellum. Alternately, the solution can use the third-party online 3D model viewer like Sketchfab viewer and Marmoset Viewer to preview the final model with fitting information.
Fashion Recommendation Subsystem
The most important part of the innovation is fashion recommendations to the customers to engage customers by retargeting or remarketing suggestions.
The cloth recognition is done by modified inception based TensorFlow CNN model 607. The model is trained for the cloth identification 605 on various cloth datasets 601. The resulting model is able to identify any clothes in any picture and generates relevant metadata. It is primarily used to identify cloth fashion data from social and popular media 604 and when combined with rankings, enhances the Open fashion datasets 602. The clothing merchandise database 603 can also use this model for cloth identification if required.
The fashion suggestions to the customers are provided by fashion cloth recommendation model 608. Initially, the model has trained 606 on open fashion images datasets 602 for fashion recommendations (see fashion datasets section). The model can make suggestions based on physical attributes from body measurement base 109. The model is further used to improve fashion datasets based on frequency and popularity of the clothes for different body sizes and shapes.
The recommendation inventory search 609 uses recommendation metadata generated by fashion cloth recommendation model 608 to look for similar items in clothing merchandise database 603.
The character fitting subsystem 610 take input of 3D character of the customer from character base 104 build during the character creation pipeline 103. It also takes the merchandise suggestions from recommendation inventory search 609 and fits them on the character using the cloth simulation and principles similar to the virtual fitting subsystem 111 and generating the final output 3D Character with Fashion Recommendations 611. Customers can post and share generating pictures and models with clothes and metadata online.
Fashion Datasets Used
The following is a list of Open Cloth Datasets 601 and Open Fashion Datasets 602 might be used by the innovation.
DeepFashion is a large-scale clothes database contains over 800 thousand diverse fashion consumer photos, annotated with rich information about clothing items. each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box, and clothing landmarks. it contains over 300 thousand cross-pose/cross-domain image pairs. there are benchmarks available for attribute prediction and consumer clothes retrieval. the data and annotations of these benchmarks can be employed as the training and test sets for the following computer vision tasks, such as clothes detection, clothes recognition, and image retrieval. The ACCV12 dataset consists of apparel classification with style. It extends Random Forests to be capable of transfer learning from different domains to connect data. It defines clothing classes and introduces a benchmark data set for the clothing classification task consisting of over 80 thousand images. It is publicly available with the classifier that outperforms an SVM baseline with 41.38% vs 35.07% average accuracy on challenging benchmark data.
CCP dataset from “Clothing Co-Parsing by Joint Image Segmentation and Labeling” (CVPR 2014) is a new clothing database including elaborately annotated clothing items. It consists of over 2 thousand high-resolution street fashion photos with totally 59 tags including a wide range of styles, accessories, garments, and pose. All images are with image-level annotations. It has over 1000 images are with pixel-level annotations The Fashionista dataset consists of over 158 thousand images without annotation, which was collected from chictopia.com in 2011. The annotation metadata can be generated with the Cloth Identification Model 607 as described above.
The fashion 10.000 dataset is composed of a set of Creative Common images collected from Flickr. The dataset contains over 32 thousand images distributed in 262 fashion and clothing categories. The dataset is named “Fashion 10.000” because initially, it had over 10 thousand fashion-related images in the dataset. The dataset comes with a set of annotations that are generated using Amazon Mechanical Turk (AMT) human computation platform. The annotations target 6 different aspects of the images which are obtained by asking 6 questions from AMT workers. Neuro-aesthetics in Fashion 144k images dataset consisting of 144,169 user posts with images and their associated metadata. It exploits the votes given to each post by different users to obtain a measure of fashionability, that is, how fashionable the user and their outfit is in the image. It proposes the challenging task of identifying the fashionability of the posts and present an approach that by combining many different sources of information, is not only able to predict fashionability, but it is also able to give fashion advice to the users.
The Stanford clothing attributes dataset introduces the clothing attribute dataset for promoting research in learning visual attributes for objects. The dataset contains 1856 images, with 26 ground truth clothing attributes such as “long-sleeves”, “has a collar”, and “striped pattern”.