The embodiments disclosed herein relate to creating maps from architectural drawings, and, in particular to systems and methods for machine-assisted map editing.
Traditional digital map creation tools are essentially drawing tools, allowing a user to draw lines, shapes, and other features with an input device (e.g., a mouse or stylus). This is often done by tracing an existing drawing (aka a “base map”) or extracting certain elements of the base map if it is vector data. To trace a map, a human user manually draws the walls, the doors, stairs, and other features using drawing tools such as Adobe Illustrator®, AutoCad®, or GIS tools such as ESRI ArcGIS® by tracing satellite imagery. Creating a map in this traditional way is time consuming and prone to human error.
Architectural drawings and floor plans may be provided in a plurality of formats. Frequently, architectural drawings or floor plans are provided in simple image format (e.g., bitmap, .jpeg, .png), generated from a computer-aided-drafting (CAD) system, image editing program, such as Adobe Photoshop™, or scanned from a hardcopy document. Such simple format images may be easily read by trained individuals, however, the variety of formats, notation styles and architectural symbols may result in difficulties for untrained individuals to read. Also, the individual must be trained to use the CAD or image editing software.
It may be advantageous to convert simple format architectural plans, or scanned architectural drawings, to more machine-readable formats, as this may enable certain functions, such as the conversion of the base map to a digital map for editing purposes.
Accordingly, there is a need for new and improved systems and methods for machine-assisted map editing that makes map creation significantly faster and more efficient for an average user with no CAD/image software experience to create a compelling digital map of indoor spaces.
The machine-assisted map editing tools described herein provide a fast and efficient way for untrained users with no image/map editing experience to trace an architectural base map to generate an editable map of architectural features. The base map may be provided as a CAD file, an image file or a scanned file of a hardcopy architectural drawing.
According to some embodiments, there is a method for machine-assisted map editing. The method comprises classifying architectural data in a base map by a trained machine learning neural network, generating a vectorized polygon representation of the classified architectural data, presenting the base map on a display interface, receiving user input tracing an architectural feature in the base map and proposing or automatically placing a polygon object representing the traced feature in an editable map. The base map may be presented on a display interface to the user on with the editable map being superimposed on the base map.
The method may further comprise training the neural network using a paired set of training images including a set of base map images as inputs and a set of vectorized polygon representations as outputs, wherein each base map image is associated to a vectorized polygon representation.
The method may include receiving user input modifying the polygon object, updating the editable map to show the modified polygon object and feeding back the user input modifying the polygon object to the neural network to further train the neural network to automatically place the polygon object correctly.
The method may further comprise detecting connections in the base map by the trained neural network, automatically placing the connections in the editable map; and, assigning a floor span for each connection.
The method may further comprise identifying an enclosed area in the editable map, classifying polygon objects, symbols and/or text within the enclosed area by the neural network and assigning a room type for the enclosed space based on the classification.
Other aspects and features will become apparent, to those ordinarily skilled in the art, upon review of the following description of some exemplary embodiments.
The drawings included herewith are for illustrating various examples of articles, methods, and apparatuses of the present specification. In the drawings:
Various apparatuses or processes will be described below to provide an example of each claimed embodiment. No embodiment described below limits any claimed embodiment and any claimed embodiment may cover processes or apparatuses that differ from those described below. The claimed embodiments are not limited to apparatuses or processes having all of the features of any one apparatus or process described below or to features common to multiple or all of the apparatuses described below.
One or more systems described herein may be implemented in computer programs executing on programmable computers, each comprising at least one processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. For example, and without limitation, the programmable computer may be a programmable logic unit, a mainframe computer, server, and personal computer, cloud-based program or system, laptop, personal data assistance, cellular telephone, smartphone, or tablet device.
Each program is preferably implemented in a high-level procedural or object-oriented programming and/or scripting language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Each such computer program is preferably stored on a storage media or a device readable by a general or special purpose programmable computer for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein.
A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary a variety of optional components are described to illustrate the wide variety of possible embodiments of the present invention.
Further, although process steps, method steps, algorithms or the like may be described (in the disclosure and/or in the claims) in a sequential order, such processes, methods and algorithms may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order that is practical. Further, some steps may be performed simultaneously.
When a single device or article is described herein, it will be readily apparent that more than one device/article (whether or not they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be readily apparent that a single device/article may be used in place of the more than one device or article.
The following relates to systems and methods for machine-assisted digital map editing tools. The systems and methods described herein may be used to create a editable digital map from an architectural drawing file (e.g., CAD file) and image file (e.g., a bitmap, a JPEG or a PNG file) or a scanned architectural drawing file (e.g., a PDF file).
Referring to
The base layer 102 is presented to the user on a display interface (e.g., a monitor or a touchscreen) and traced by a user using an input device (e.g., a mouse or a stylus) to create the editable map 106. The editable map 106 may be superimposed on the base map 102 as it is traced by the user.
An invisible machine learning (ML) layer 104 operates between the base map 102 and the editable map 106 to generate vectorized polygons for placement in the editable map 102. Vectorized polygon objects representing the architectural data may advantageously be scaled infinitely without loss of quality, and may be easily parsed by machine, to create the editable digital map 106, 3D models and more, for various applications.
The ML layer 104 incorporates a trained ML model (
Compared to existing map editing tools, the smart machine-assisted map editing tools described herein can significantly make it faster and more efficient for drawing a map by automating some or all steps that would typically have to be done manually be a user. For example, manual steps of 1) drawing a wall with a first drawing tool and 2) adding an entrance to the wall with a second drawing or selection tool can be combined and automated as described below. In addition, according to various embodiments, the ML layer 104 may provide smart tools for: snapping wall vertexes (including the first one that is traced) together to form walls; automatically styling walls/objects based on the part of the base map 102 they are being drawn over; and/or suggest tooling or editing features based on the base map 102.
Different from traditional “pure” machine learning tools, the user can confirm or modify smart tool operations performed by the ML layer 104 which is fed back into the ML model. In this regard, user input reduces the problem of false positives/negatives by giving immediate feedback that can be used to correct or further train the ML model. Overall, this mitigates the need for perfect accuracy in the ML model.
Referring to
At 122, architectural data in a base map is classified by a trained ML neural network.
An exemplary base map 200 is shown in
The ML neural network is preferably a generative adversarial network (GAN) trained to classify markings or symbols in the base map as architectural features using paired training set data. For example, the ML neural network is trained to recognize various markings or symbols in the base map 200 that indicate the location of doors 206a, 206b, 206c and classify those features as doors. In the same manner, the ML neural network is trained to recognize walls 202, windows 204, connections 208, etc.
In some base maps 200, distortion or artifacts (not shown in
Referring back to
Generally, all architectural features classified at step 122 are represented as vectorized polygons defined by at least 3 points (vertices) in a coordinate system. Variables for different classes of architectural features/polygon objects are defined with variable values suited to that particular class. For example, walls 202 in
Minimum wall length defines the minimum length for a line in the base map to be classified as a wall. If a detected wall is below the minimum wall length it may be disregarded. Max door length defines the maximum size for a door. If a detected door is too large it may be disregarded as a door. The maximum vertex snap distance defines that if a detected door is only a foot wide, it is snapped to a minimum door width. Maximum wall snap distance defines that if an angle between 2 walls is within a threshold of a major angle, for example 2 degrees away from 90 degrees, it is snapped to 90 degrees. The maximum wall close distance defines that if two disjointed walls are within a threshold distance of each other, they can be snapped together into a single wall.
Referring again to
At 128, a user input tracing an architectural feature in the base map is received. The user input includes the position(s) or coordinate(s) of the trace made by the user in the base map. Tracing an architectural feature may include only tracing a portion of the entire architectural feature shown in the base map. For example, the user input may be tracing a section of a wall. According to some embodiments, it is preferred for the user to start tracing at a vertex of the architectural feature (e.g., a corner or an end of a wall) as described below. According to other embodiments, the user input may include tracing an area around the architectural feature (e.g., drawing a box or a circle around a window).
At 130 a polygon object representing the architectural feature traced at step 128, is proposed for placement, or automatically placed in the editable map. Steps 128 and 130 may be performed concurrently so that the polygon object is placed in the editable map as it is being traced by the user, or immediately thereafter. The position/coordinates of the traced feature in the base map corresponds to the position/coordinates of that same feature in the vectorized polygonal representation of the architectural data generated at step 124 and stored in the memory. Accordingly, the map editing system knows (or has predicted) the polygon object that corresponds to the feature that is traced, and places the polygon object in the editable map at the position/coordinates of the traced feature.
Following step 130, the method 120 may cycle through steps 128 and 130 until all walls in the base map are traced and placed in the editable map. The user does not need to trace each and every architectural feature in the base map separately as described below.
Referring to
In
In
In
In
Referring back to
At 134, the editable map may be updated to show the modified polygon object depending on whether a modification was received at step 132. Following step 134, the method 120 cycles through steps 128, 130 (and in some cases steps 132 and 134) until all architectural features in the base map are traced and placed in the editable map.
At 136, the user inputs correcting or modifying the polygon objects at step 136 are fed back to the ML model (
Referring to
At 142, connections in the base map are detected by a trained ML neural network. Connections refer to stairways, elevators, walkways, or the like, that connect the area depicted in the base map to another floor or area not depicted in the base map. The ML neural network is preferably a GAN trained to classify lines, markings or symbols denoting connections in the base map using paired training set data. For example, the ML neural network may be trained to recognize lines, markings or symbols in the base map that indicate the location of connections.
At 144, in some embodiments, the connections detected in the base map are proposed for placement in the editable map. The position/coordinates of a connection in the base map corresponds to the position/coordinates of that same connection in the vectorized polygonal representation of the architectural data stored in the memory. Accordingly, the map editing system knows (or has predicted) the position of the connection, and proposes placement of the connection in the editable map.
At 146, in some embodiments, user input confirming placement of the connections is received. Given that the editable map is superimposed on the base map, the user can quickly and easily confirm whether the proposed connection placement in the editable map is actually a connection in the base map.
At 148, the connections are placed in the editable map.
Referring to
Referring back to
For automatic assignment of floor spans, the map editing system may be configured to align base maps to detect the same connection across floors within some expected proximity. For example, if an elevator is detected on floor 1 and 2 (i.e., detected in the base maps of floors 1 and 2) and it is in the same position on both floors 1 and 2, the elevator it is accordingly assigned a span from floor 1 to floor 2. Automatic floor span assignment for stairways is done by looking for the connection's position to fall within some range to account for the slope and direction of said slope. If a stairway connection is not present in an expected position on the floor above or the floor below, it is inferred that the stairway ended on the previous floor and the floor span is assigned accordingly.
Referring to
Referring to
At 162, an enclosed area in an editable map is identified. Generally, an enclosed area is an area in the editable map that is completely traced and bounded by walls on four sides. According to various embodiments, enclosed areas may be identified based on other criteria. For example, the enclosed space may have certain minimum/maximum dimensions, have at least one door, etc.
At 164, polygon objects within the enclosed space are classified by a trained ML neural network. The ML neural network is preferably a GAN trained to classify markings or symbols denoting objects such as furniture (e.g., desks, beds, tables, chairs), fixtures (e.g., sinks, toilets, showers), or other objects (e.g., computers, equipment). The ML neural network may also be trained to classify text labels within the enclosed space in the base map.
Referring to
Referring again to
Referring now to
After training, the generator 806 is configured to generate a sample vectorized polygon representation upon command, or upon input of a random noise vector 802, and the discriminator 808 is configured to be provided with the sample vectorized polygon representation. The discriminator 808 may comprise a binary classification network, which is configured to output a binary determination (at 810) as to whether the provided sample is a real sample, as per provided real-world data 804, or a generated/incorrect sample, in comparison to the real-world data 804. The discriminator 808 result 810, as well as accompanying data, may be fed back to both the generator 806 and discriminator 808, such that the generator 806 may update parameters of its model, and iterate the generated sample.
The model 800 may be run continuously, until the discriminator 808 outputs a determination at 810, that the generated sample is “real”, wherein the discriminator 808 is unable to discriminate between a real-world data 804 sample and the generated sample. At such a point, the sample is determined to be the final output of the model 800.
In the example of the methods 120, 140 and 160, a GAN such as the GAN 800 described above may be employed as the trained ML model. In some examples, a variation of a GAN, similar to GAN 800 may be employed in the methods 120, 140 and 160. For example, a GAN purposely designed for image style shifting or image to image translation may be applied. Such GANs are configured to convert an image from one style domain (e.g., the base map) to another style domain (e.g., the vectorized polygon representation), according to the provided paired training dataset.
According to some embodiments, pix2pixHD or a similar ML model or software package configured for image style transfer applications may be employed by methods 120, 140 and 160 as the ML model. Pix2pixHD comprises a GAN based machine learning model, which may synthesize high resolution images (e.g., up to 2048×1024 pixels), and may provide for image style translation functionality when provided with an appropriate paired image training set. The pix2pixHD generator component comprises groups of convolution layers, followed by groups of residual network layers followed by groups of deconvolution layers. The pix2pixHD discriminator component comprises a multiscale discriminator, wherein multiple discriminators are employed, each evaluating inputs at different scales, such that various receptive fields and levels of detail may be evaluated by the discriminator.
In some examples, wherein pix2pixHD is employed, hyperparameters may include number of iterations, number of decay iterations, learning rate, loadSize, fineSize, Adam Optimizer Momentum, BatchSize, and n_layers_D. In some examples, pix2pixHD hyperparameters may be configured such that number of iterations=200, number of decay iterations=200, learning rate=0.0002, loadSize=1024, fineSize, =512 Adam Optimizer Momentum=0.5, BatchSize=1, and n_layers_D=3. Such a configuration may result in good performance, and high-quality output. In other examples, hyperparameters may vary from the hyperparameters provided herein.
In examples wherein pix2pixHD, or a similar method is applied, a paired training set of images, may be provided to the GAN as real world data 804 at the training phase to train the model. For example, the user may collect a plurality of architectural plan data i.e., base maps. Preferably, the collection of base maps are relatively similar in style and configuration to the base maps that the user wishes to vectorize using the methods described herein. For example, if the base map that the operator wishes to vectorize comprise certain line styles to denote walls and windows, where the collection of base maps comprise images with similar or matching line styles, a training set may maximize the effectiveness of the methods described herein.
The user may generate corresponding vectorized polygon representations for each base map, by manually drawing, sizing and annotating polygon objects, resulting in a paired training set. Once the user has collected the plurality of paired training images, the paired training set may be provided to a ML model (e.g., GAN 800), to train the ML model.
Other ML models for: classification of architectural data in base maps, for example, detection of connections; identification of enclosed areas in editable maps; and classification of objects, symbols or text within enclosed areas may also be provided with the same general architecture as the GAN 800 and be trained in a similar manner as described above.
For example, a paired training set of images for classification of architectural features in base maps may be created by providing a first set of base maps images and a second set of the base map images manually labelled/annotated with pixel-wise classification of the architectural data therein to train the model to detect/identify walls, doors, windows, desks, tables, toilets, sinks, etc.
After training the ML model, instead of inputting random noise vector 802 into generator 806, a base map image may be inputted. The generator 806 may then generate an output corresponding to the inputted image. For example, a base map, such as
Referring now to
The system 900 comprises a memory 904 and at least one processor 902. According to some embodiments, the at least one processor 902 may be any processor known in the art for executing machine instructions. According to an embodiment, the at least one processor 902 is at least one GPU specifically configured for image processing and/or specifically configured to execute GAN 800. According to an embodiment, the system 900 includes a first processor configured to execute the generator 806 neural network and a second processor configured to execute the discriminator 808 neural network.
The memory 904 may be any form of memory known in the art that may store machine instruction data, and input/output data, such as training set data, input architectural plan data, and output translated architectural plan data. Memory 904 and processor 902 are configured such that they may readily communicate with and pass data to one another.
The system 900 comprises a display interface 922 (e.g., a touchscreen, a monitor or other display screen) and an input device 920 (e.g., a mouse, a stylus). A system bus 924 and/or one or more interface circuits (not shown) couple the processor 902 to the memory 904 and the other components 920, 922.
The memory 904 stores architectural plan data 906, one or more paired training data sets 908, vectorized polygon representations 910, one or more machine learning models 912, a processing module 914 and editable maps 916.
Architectural plan data 806 may comprise base maps, exemplified by the base map 200 in
Paired training data set 808 may comprise a paired set of base maps and vectorized polygon representations. Referring to
The vectorized polygon representations 910 may comprise the output of the machine learning model 912. For example, translated architectural plans 810 may resemble translated architectural plan 204 of
The machine learning model 912 may comprise a machine learning model configured to accept a base map 906 as an input, and output a vectorized polygon representation 910. In some examples, the machine learning model 912 may comprise a generative adversarial network, like the network 800 shown in
The processing module 914 may comprise a software module which provides smart map editing tools to the user for operations described above with reference to the methods 120, 140 and 160. For example, the processing module 914, may render the editable map and superimposed the editable map over the base map on the display interface 922 when the user traces architectural features in the base map with the input device 920.
While the above description provides examples of one or more apparatus, methods, or systems, it will be appreciated that other apparatus, methods, or systems may be within the scope of the claims as interpreted by one of skill in the art.
Number | Date | Country | |
---|---|---|---|
63500079 | May 2023 | US |