The present disclosure relates generally to the field of computer vision. More specifically, the present disclosure relates to computer vision systems and methods for segmenting and classifying building components, contents, materials or attributes.
In the insurance industry, various insurance-related actions such as insurance policy adjustments, insurance quote calculations, underwriting, inspections, claiming process and/or property appraisal are often performed. For example, a human operator (e.g., a property inspector) often must physically go to a property site to inspect the property for assessments related to the above actions, and large amounts of paperwork must be generated and processed to evaluate a market value of the property, an insurance quote, a price for an insurance coverage, a remodel cost, an investment value, and/or any related values and costs associated with the above actions based on the inspection. Further, to the extent that there are software tools that can assist with performing some the foregoing tasks, such software tools are severely lacking in their technical capabilities. Still further, such systems require other actions such as flagging changes in rooms over time and populating estimate fields in such software systems (e.g., pre-filling and/or post-checking estimate fields).
The foregoing operations involving multiple human operators are cumbersome and are prone to human error. In some situations, the human operator may not be able to capture accurately and thoroughly all items (e.g., furniture, appliances, doors, windows, ceilings, fences, floors, walls, electronics, structure faces, roof structure, trees, pools, decks, etc.), and recognize materials or attributes of the items, which may result in inaccurate assessment and human bias errors. Thus, what would be desirable are computer vision systems and methods for segmenting and classifying building components and contents, and their associated materials or attributes, which address the foregoing, and other, needs.
The present disclosure relates to computer vision systems and methods for segmenting and classifying building components, contents, materials or attributes. The system obtains media content (e.g., a digital image, a video, a video frame, etc.) indicative of an asset (e.g., a real estate property). The system identifies and segments items (e.g., walls, doors, floors, items, materials, contents of structures, etc.) of the asset based on one or more segmentation models (e.g., neural network-based segmentation models). Optionally, the system selects each of the segmented items (e.g., automatically using a mask or based on user input, etc.) and determines, based on one or more classification models (e.g., machine/deep-learning-based classifiers, transformers, etc.), a value associated with material or other attribute classification for each of the segmented items. The value indicates how likely the segmented item belongs to a particular material or attribute type (e.g., wood, laminate, etc.). The system determines a material or attribute type for each of the segmented items based on a comparison of the confidence value of the material or the attribute to pre-calculated threshold values. The threshold values define a cut-off indicative of a segmented item most likely to be a particular type of material or attribute. Each material or attribute can have its own pre-calculated threshold value.
The foregoing features of the invention will be apparent from the following Detailed Description of the Invention, taken in connection with the accompanying drawings, in which:
The present disclosure relates to computer vision systems and methods for segmenting and classifying building components, contents, materials and attributes, as described in detail below in connection with
Turning to the drawings,
An asset can be a resource insured and/or owned by a person or a company. Examples of an asset can include real estate property (e.g., residential properties such as a home, a house, a condo, an apartment, and commercial properties such as a company site, a commercial building, a retail store, etc.), a vehicle, or any other suitable properties. An asset can include one or more items, such as interior items, and/or exterior items. Additionally, assets can include content items (e.g., personal property such as a refrigerator, television, etc.) present in a building/structure. Examples of the items are shown in Table 1 and Table 2.
The database 14 can include various types of data including, but not limited to, media content indicative of an asset as described below, one or more outputs from various components of the system 10 (e.g., outputs from a data collection engine 18a, an item segmentation engine 18b, a computer vision segmentation module 20a, a material or attribute detection engine 18c, a material or attribute classification module 20b, a training engine 18d, a training data collection module 20c, a feedback loop engine 18e, a value and cost estimation engine 18f, and/or other components of the system 10), one or more untrained and trained computer vision models, and associated training data, one or more untrained and trained classification models, and associated training data, and one or more data collection models. It is noted that the value and cost estimation engine 18f could comprise and/or communicate with one or more commercially available pricing databases, such as pricing databases provided by XACTWARE SOLUTIONS, INC. The system 10 includes system code 16 (non-transitory, computer-readable instructions) stored on a computer-readable medium and executable by the hardware processor 12 or one or more computer systems. The system code 16 can include various custom-written software modules that carry out the steps/processes discussed herein, and can include, but is not limited to, the data collection engine 18a, the item segmentation engine 18b, the computer vision segmentation module 20a, the material or attribute detection engine 18c, the classification module 20b, the training engine 18d, the training data collection module 20c, the feedback loop engine 18e, and the value and cost estimation engine 18f. The system code 16 can be programmed using any suitable programming languages including, but not limited to, C, C++, C#, Java, Python, or any other suitable language. Additionally, the system code 16 can be distributed across multiple computer systems in communication with each other over a communications network, and/or stored and executed on a cloud computing platform and remotely accessed by a computer system in communication with the cloud platform. The system code 16 can communicate with the database 14, which can be stored on the same computer system as the code 16, or on one or more other computer systems in communication with the code 16.
The media content can include digital images and/or digital image datasets including ground images, aerial images, satellite images, etc. where the digital images and/or digital image datasets could include, but are not limited to, images of the asset. Additionally, and/or alternatively, the media content can include videos of the asset, and/or frames of videos of asset. The media content can also include one or more three-dimensional (3D) representations of the asset (including interior and exterior structure items), such as point clouds, depth maps, light detection and ranging (LiDAR) files, etc., and the system 10 could retrieve such 3D representations from the database 14 and operate with these 3D representations. Additionally, the system 10 could generate 3D representations of the asset, such as point clouds, depth maps, LiDAR files, etc. based on the digital images and/or digital image datasets. As such, by the terms “imagery” and “image” as used herein, it is meant not only 3D imagery and computer-generated imagery (e.g., LiDAR, point clouds, 3D images, etc.), but also two-dimensional (2D) imagery.
Still further, the system 10 can be embodied as a customized hardware component such as a field-programmable gate array (“FPGA”), an application-specific integrated circuit (“ASIC”), embedded system, or other customized hardware components without departing from the spirit or scope of the present disclosure. It should be understood that
In step 54, the system 10 identifies and segments one or more items of the asset based at least in part on one or more segmentation models. As mentioned above, Table 1 and Table 2 show various examples of an item of the asset (e.g., real estate property). A segmentation model can identify one or more items in the media content and determine which pixels in the media content belong to the detected item. The segmentation model utilizes deep convolutional neural networks (CNNs) to perform an instance segmentation task, such that it detects objects (e.g., structural components and other items noted herein) and predicts a mask (a region of the media content belonging to a particular item) for each object to specify which pixels are to be considered part of the object. For example, as shown in
Returning to
In step 58, the system 10 determines a material or attribute classification for the one or more segmented items. Preferably, only items for which material or attribute recognition makes sense are selected in this step. For example, a floor detection would be subject to material recognition, while a circuit breaker box would not, and a door or window will be subject to attribute classification. Further, a door, a window, or a ceiling can have a style attribute based on the make of the door, window, or ceiling, which can be predicted by the model. A classification model can place or identify a segmented item as belonging to a material or attribute classification, as applicable. Examples of material or attribute classifications of items are provided in Table 3. The placement of the segmented item into the particular material or attribute classification can be based on a value (e.g., a probability value, confidence score, or the like) associated with a material or attribute classification compared to a threshold value. The classification model can be a supervised machine/deep-learning-based classifier, such as CNN based classifier (e.g., ResNet based classifier, AlexNet based classifier, VGG-16 based classifier, GoogLeNet based classifier, or the like). The classification model can include one or more binary classifiers, and/or one or more multi-class classifier. In some examples, the classification model can include a single classifier to identify a material or atttribute type for each segmented item in a region of interest (ROI). In another examples, the classification model can include multiple classifiers each analyzing a particular item for material or attribute detection. The classifier takes as input both the full image, and the ROI, as determined by the segmentation model or via user input. This acts as an ROI-based attention mechanism, thus telling the model which part of the image to classify, while still providing the whole image as contextual input.
The classification model can generate a value associated with a material or attribute classification of a segmented item based on a segmentation mask associated with the segmented item. The value can indicate how likely the segmented item belongs to a particular material or attribute type. For example, a door can have a greater value associated with a wood material type than a ceramic material type indicating that the door is more likely to belong to the wood material type than the ceramic material type. The classification model can further narrow down the likelihood using threshold values, as described below.
As noted above, the system 10 determines a material or attribute type for the one or more segmented items based at least in part on a comparison of the value to one or more threshold values. For example, continuing the above example, for a situation having a single threshold value indicative of a segmented item most likely to belong to a material or attribute type, if the classification model determines that the value exceeds (e.g., is equal to or is greater than) the single threshold value, the classification model can determine that the segmented item (e.g. a door) is most likely to belong to the material or attribute type (e.g., a wood material type). If the value is less than the single threshold value, the classification model can determine that the segmented item (e.g. a door) is not most likely to belong to the material or attribute type (e.g., a ceramic material type). Additionally, and/or alternatively, a multi-class classification model can generate multiple values associated with different material types or attributes, and can determine whether the segmented item belongs to one or more different material types or attributes. For a situation having more than one value and corresponding threshold value when using a multi-class model (e.g., a first value indicative of a segmented item is most likely to belong to a first material or attribute type, and a second value indicative of segmented item is most likely to belong to a second material or attribute type, and so forth), if the value exceeds a first threshold value, the classification model can determine that the segmented item is most likely to belongs to the first material or attribute type, and so forth. For each item, the classification model outputs a score for every possible material or attribute. Each score in the prediction is assigned a threshold value independently. It should be understood that classifiers of the segmentation models also perform similar functions to identify items. It should be also understood that the system 10 can perform the aforementioned task of classification via the material or attribute detection engine 18c and/or the material or attribute classification module 20b.
For example, as shown in
In some examples, the system 10 can perform a item segmentation and a material or attribute detection for 3D representations of the asset. For instance, a LiDAR enabled device can capture the 3D representations of the asset. A segmentation model can be used to process the image data. The resulting segmentation masks can then be represented via depth map data, point cloud, mesh, voxel or any other 3D data format. This enables a finer grained segmentation result, material or attribute recognition taking into account a surface texture, automated area measurement, and 3D reconstructions of the scanned space. For example, the segmentation model can perform in two-dimensional (2D) values mapped to 3D values via depth mapping techniques. RGB based segmentations can be overlaid onto RGBD data to create 3D segmentations. The RGBD data for segmented areas can be converted to point cloud, voxel or other 3D format for visualization via 3D scene reconstructions. Combination of the 3D measurement with the segmented areas can automate area/object dimension calculations. As another example, the segmentation can perform directly on 3D data. The segmentation can include one or more additional models to combine the 2D segmentation with depth for finer segmentation results. The additional models can be trained directly on 3D data to obtain a finer grained segmentation, a higher accuracy material or attribute recognition, and an accurate pose estimation and plane/surface detection. In some examples, the system 10 can perform a scene reconstruction via segmented point cloud space representations. As a LiDAR enabled device moves throughout a space, it is capable of creating a reconstruction represented by a point cloud or mesh. Segmentation and classification models can calculate the class and material or attribute associated with each region of the point cloud or mesh. Automated LiDAR based measurement technology can calculate dimensions of each segmented region. Users can input information to clarify areas/objects in the reconstruction. Additional models can be used to predict any gaps within a scanned area to create a continuous surface for a resulting 3D model.
In step 114, the system 10 retrieves media content of the item and the material or attribute type based at least in part on one or more data collection models. A data collection model can connect a text and/or a verbal command with one or more media content having information of the text and/or the verbal command, such as connecting the search query with one or more images having the item and the material or attribute type. A data collection model can include a machine/deep learning-based model, such as a neural network model. Additionally, and/or alternatively, the data collection model can use a pre-prepared set of keywords and other queries which are then processed by the system, and the returned images are sorted by how well they match the queries to identify the most promising images (such sorting could be performed automatically by the system, or manually by users). The data collection model can retrieve the media content having the item and the material or attribute type (e.g., retrieved images 136 of
In step 116, the media content is labelled with the item and the material or attribute type to generate a training dataset. For example, the system 10 can generate metadata indicative of the item and the material or attribute type and combine the metadata with the media content to generate a training dataset.
In step 118, the system 10 trains a segmentation model and a material or attribute type classification model based at least in part on the training dataset. For example, the system 10 can adjust one or more setting parameters (e.g., weights, or the like) of the segmentation model and the material or attribute classification model using the training dataset to minimize an error between the generated output and the expected output of the above models. In some examples, during the training process, the system 10 can generate one or more confidence values for an object to be identified as an expected item or for an identified item to be classified to an expected material or attribute type. In step 120, the system 10 receives a feedback associated with an actual output after applying the trained segmentation model and the trained material or attribute classification model to an unseen asset. For example, a user can provide a feedback if there is any discrepancy in the predictions.
In step 122, the system fine-tunes the trained segmentation model and the trained material or attribute classification model using the feedback. For instance, data associated with the feedback can be used to adjust setting parameters of the segmentation model and the material or attribute classification model and can be added to the training dataset to increase an accuracy of predicted results. In some examples, an item (e.g., a countertop) was previously determined to belong to a material type (e.g., a granite material type). A feedback measurement indicates that the item actually belongs to a different material or attribute type (e.g., a laminate material type). The system 10 can adjust (e.g., decrease or increase) weight to weaken the correlation between the item and the material or attribute type. It should be understood that the system 10 can perform the aforementioned task of training steps via the training engine 18d, and the training data collection module 20c, and the system 10 can perform the aforementioned task of feedback via the feedback loop engine 18e.
Having thus described the system and method in detail, it is to be understood that the foregoing description is not intended to limit the spirit or scope thereof. It will be understood that the embodiments of the present disclosure described herein are merely exemplary and that a person skilled in the art can make any variations and modification without departing from the spirit and scope of the disclosure. All such variations and modifications, including those discussed above, are intended to be included within the scope of the disclosure. What is desired to be protected by Letters Patent is set forth in the following claims.
The present application claims the benefit of priority of U.S. Provisional Application Ser. No. 63/289,726 filed on Dec. 15, 2021, the entire disclosure of which is expressly incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63289726 | Dec 2021 | US |