The present disclosure pertains to a computer-implemented method for automatically generating maps comprising land-cover information of an area based on a plurality of input images. In particular, a texture comprising the generated land-cover information is generated and used for texturing a three-dimensional mesh or an orthoimage of the area for providing the land-cover information, e.g. as a two-dimensional land-cover map, to a user. The land-cover information is created based on a plurality of input images and using artificial intelligence (AI).
Generating maps with land-cover information using AI—e.g. including techniques of machine learning (ML) such as deep learning and feature learning—is an established topic of research. For instance, an approach using per-pixel classification in satellite images to determine land cover is described in D. Hester et al.: “Per-pixel Classification of High Spatial Resolution Satellite Imagery for Urban Land-cover Mapping”, Photogrammetric Engineering & Remote Sensing, Number 4/April 2008, pp. 463-471, American Society for Photogrammetry and Remote Sensing. Another approach is described in M. Herold et al.: “The spectral dimension in urban land cover mapping from high-resolution optical remote sensing data”, Proceedings of the 3rd Symposium on Remote Sensing of Urban Areas, June 2002, Istanbul.
However, existing approaches rely on orthophotos or satellite imagery, so that the resulting land-cover information is based only on a single view (or—e.g. in the case of overlapping orthoimages—based on very similar views). Thus, disadvantageously, the land-cover information of some parts of the area may not be determined with sufficient certainty. Also, some areas may be occluded in the single view, e.g. because of objects blocking the view between a satellite or aerial camera and the ground (e.g. vegetation such as trees, mobile objects such as vehicles, or roofing such as covered walkways). In this case the land-cover information related to the ground at these areas cannot be determined directly but has to be guessed, e.g. based on the visible surrounding areas.
It would be desirable to provide a method that increases the certainty in determining the land-cover information and allows directly determining the ground land cover of areas that are occluded in orthoimages.
It is therefore an object of the present disclosure to provide an improved computer-implemented method for automatically generating land-cover information of an area.
It is another object to provide such a method that allows generating the land-cover information with higher certainty.
It is another object to provide such a method that allows generating a land-cover map using the land-cover information.
At least one of these objects is achieved by the embodiments described herein.
A first aspect pertains to a computer-implemented method for generating one or more land-cover maps of an area. The method comprises the following steps that are executed in a computer system:
According to the first aspect, the method further comprises:
According to one embodiment, the method comprises
In one embodiment, a plurality of different land-cover maps are generated for the same area, and the method comprises receiving a user input that comprises a selection of one of the plurality of generated land-cover maps to be displayed, and displaying the selected land-cover map on the screen. Optionally, indicators of selectable land-cover maps of the plurality of land-cover maps are displayed and the user input comprises selecting one of the selectable land-cover maps.
According to another embodiment of the method, the one or more land-cover maps comprise at least a combined land-cover map showing the most probable land-cover class for every pixel of the map.
According to yet another embodiment of the method, the one or more land-cover maps comprise at least one or more per-class land-cover maps showing the probability of one land-cover class for every pixel of the map.
According to a further embodiment of the method, the one or more land-cover maps comprise at least one 2D land-cover map that is generated based on the 3D mesh. For instance, the 2D land-cover map may be generated by rasterization of the 3D mesh to an orthographic view.
In one embodiment, for generating the 2D land-cover map, a ray is created for each pixel of said 2D land-cover map, which ray runs in vertical direction from the respective pixel through the 3D mesh, the ray crossing a surface of the 3D mesh at one or more crossing points.
In one embodiment, the area comprises 3D objects including buildings, vehicles and/or trees. In this case, the at least one 2D land-cover map may comprise:
According to another embodiment of the method, the one or more land-cover maps comprise at least one 3D model of the area, which 3D model is generated based on the 3D mesh. For instance, the 3D model is a classified mesh or point cloud, and/or shows the most probable land-cover class.
According to another embodiment, the method comprises receiving an orthoimage of the area. For instance, the pixels of the land-cover map may correspond to at least a subset of the pixels of the orthoimage. In one embodiment, the plurality of cameras is selected based on the orthoimage.
According to another embodiment of the method, the plurality of input images comprise
According to another embodiment, the method comprises receiving depth information and using the depth information for generating the 3D mesh. For instance, at least a subset of the cameras may be embodied as a stereo camera or as a range-imaging camera and configured to provide said depth information.
According to yet another embodiment of the method, the semantic segmentation in the input images is performed using artificial intelligence (AI) and a trained neural network, e.g. using a machine-learning, deep-learning or feature-learning algorithm. In one embodiment, the set of land-cover classes comprises at least ten land-cover classes, more particularly at least twenty land-cover classes.
According to one embodiment of the method, the weighting comprises weighting probabilities of a set of single-image probability values the higher, the more acute the angle of an image axis of the input image of the respective set of single-image probability values is relative to the 3D mesh at a surface point of the 3D mesh onto which the set of single-image probability values is projected. For instance, the weighting may comprise using the cosine of the angle.
According to another embodiment of the method, the weighting comprises assigning a confidence value to each set of single-image probability values. For instance, the weighted set of single-image probability values may be calculated by multiplying the respective set of single-image probability values and the confidence value.
A second aspect pertains to a computer system comprising a processing unit and a data storage unit, wherein the data storage unit is configured to receive and store input data, e.g. comprising input-image data, to store one or more algorithms, and to store and provide output data. The algorithms comprise at least an SfM algorithm and optionally also a machine-learning, deep-learning or feature-learning algorithm. The processing unit is configured to generate, based on the input data and using the algorithms, at least one land-cover map of an area as output data by performing the method according to the first aspect.
A third aspect pertains to a computer programme product comprising programme code which is stored on a machine-readable medium, or being embodied by an electromagnetic wave comprising a programme code segment, and having computer-executable instructions for performing, particularly when executed on a processing unit of a computer system according to the second aspect, the method according to the first aspect.
The disclosure in the following will be described in detail by referring to exemplary embodiments that are accompanied by figures, in which:
The map 20′ depicted in
The cameras in
The input images can be captured at various locations, with different camera systems, under possibly different lightning conditions and can span a range of resolutions. For instance, the ground sample distance (GSD) in each image may vary between 2 and 15 cm. Preferably, the cameras 31-35 are calibrated, which allows easy transition between world points (points in real world) and pixels in individual images capturing the respective world point.
In some embodiments, at least a subset of the cameras 31-35 is embodied as stereo cameras or range-imaging cameras providing depth information and/or allowing feature or topography extraction. Also, data from one or more LIDAR devices or 3D laser scanners (not shown here) may be used for providing depth or range information.
This approach, using a plurality of input images, allows more robust predictions compared to predictions based on single-view orthoimages and may be divided into two main stages.
In a first stage, the input images 11-15 are segmented into several semantic classes, i.e. pre-defined land-cover classes. This stage may be run on every input image 11-15 separately and includes determining probabilities of the land-cover classes for each pixel of each input image 11-15. The segmentation may be based on publicly available neural networks trained on data processed by a computer vision pipeline. This includes using a training dataset and various data augmentation techniques during the training to ensure generality of the model. For semantic segmentation of images, publicly available up-to-date neural network architectures may be used. Suitable network architectures comprise, e.g., “Deeplab v3+” or “Hierarchical Multi-Scale Attention”. Once the network is trained, every input image 11-15 is processed by pixels or tiles and segmented into desired classes.
The second stage is based on structure-from-motion (SfM) approaches combining the segmented images to generate a single 3D model (e.g. a mesh or point cloud). Optionally, generating the 3D model additionally comprises using depth or range information that is captured using 3D scanners (e.g. LIDAR), stereo cameras and/or range-imaging cameras.
The individually segmented images—together with the probabilities determined during semantic segmentation for each image—are then projected onto the 3D model, e.g. onto vertices of the 3D mesh created by SfM algorithms. The projected probabilities are weighted by the angle of impact to the mesh and averaged.
Weighting the probabilities adds a confidence factor that is based on the respective angle of the image axis relative to the surface of the 3D mesh (or other 3D model) onto which the image pixel is projected. For instance, the probabilities of a certain image pixel may be weighted the higher the more acute the impact angle of the respective image axis is relative to the mesh at that surface point of the mesh onto which said image pixel is projected. In some embodiments, this weighting comprises using the cosine of the angle. Since each impact angle is between 0° and 90°, the respective cosine values are between 1 and 0, wherein a value of 1 means the highest weighting and a value of 0 means the lowest weighting. Thus, acute angles having high cosine values are weighted higher, whereas right angles are given the lowest weight.
Using the 3D mesh, the land-cover predictions are not limited to 2D space only. This can be beneficial, for example in extraction of trees and buildings, i.e. to determine the land cover below roofs or vegetation.
The approach allows generating and presenting to a user for instance:
The combined land-cover maps 20 and the per-class land-cover maps 21-23 may be displayed as 2D maps, whereas the classified point clouds or meshes 24 may be displayed as 3D maps. The 2D maps may either respect the occlusions by the 3D mesh from orthographic view (“vision related”), or ignore the occlusions by the 3D mesh, thus allowing to see under trees and overhangs of buildings (“ground related”), optionally showing the highest probability through all mesh layers without occlusions from orthographic view.
For each pixel of a 2D map, a ray is created that runs in vertical direction from the respective pixel through the mesh 25. This ray thus crosses the mesh 25 at one or more points.
For a vision-related (e.g. top-view) map, only the highest of those crossing points is used and the most probable class is chosen from the averaged probabilities. For a ground-related map, only the lowest of those crossing points is used and the most probable class is chosen from the averaged probabilities.
For a per-class land-cover map, the highest probability in every pixel for every land-cover class is required separately. Thus, for every pixel the maximum probability of a given class in the crossing points may be used.
Additionally, by combining probabilities from different views, non-rigid objects such as moving cars can be identified in the scene. This information can be used to remove moving objects that cause visually unpleasing effects from the texturing. This removing of moving vehicles from a texture is disclosed in the applicant's earlier application with the application number EP21204032.3. Similarly to removing the moving objects from the texture, they may also be ignored in land-cover information, instead showing the land-cover information of the ground beneath the moving objects.
These per-class land-cover maps 21-23 may be generated for each land-cover class. In
The method starts with receiving 110 a plurality of digital input images of the area, e.g. from the cameras 31-35 shown in
A number of possible land-cover classes for each pixel is detected and the probabilities of the possible land-cover classes are identified 130 for each pixel of each input image. A 3D mesh of the area is generated 140 using the input images and a structure-from-motion (SfM) algorithm. The identified probabilities are then projected 150 onto this mesh.
The probabilities provided by the single segmented images for each of their image pixels are then weighted 160 by adding a confidence factor that is based on the respective angle of the image axis relative to the mesh surface. In some embodiments, this weighting 160 comprises using the cosine of the angle. Since each angle is between 0° and 90°, the respective cosine values are between 1 and 0, wherein a value of 1 means the highest weighting and a value of 0 means the lowest weighting. Right angles are weighted the lowest and acute angles having high cosine values are weighted higher. Consequently, the probabilities of a certain image pixel are weighted the higher the more acute the angle of the respective image axis is relative to the mesh at that surface point of the mesh onto which said image pixel is projected.
After the weighting 160 of the individual probabilities, overall probabilities of all land-cover classes can be determined 170 and assigned 180 to the pixels of the resulting land-cover map.
For instance, a certain pixel of the map may be visible in three input images. Probabilities of all classes in every image are then multiplied with the angle-dependent weighting value (confidence factor) of the respective image. The resulting values (including both the probability and the confidence factor) of all images can then be used to determine the overall probabilities and assign the most probable land-cover class to the respective pixel of the land-cover map.
Colours or other graphical indicators, such as brightness values or patterns, might be assigned to the land-cover classes, and a land-cover map may be displayed to a user, wherein each pixel has the colour assigned to its most probable land-cover class. The colours may be assigned through a user input or pre-defined, e.g. assigning the colours at least partially to allow the user intuitively recognizing the land-cover class from the displayed colour. For instance, trees might be assigned a green colour, streets a grey colour etc.
Semantic segmentation is performed for each input image 11 of the plurality of input images 11-15 using an ML algorithm 44. The resulting segmented images 11′-15′ provide sets of single-image probability values 51-55, i.e. probability values for each pixel of the segmented image.
The segmented images 11′-15′ are projected onto the 3D mesh 25 and confidence values 61-65 are assigned to each pixel of the segmented images 11′-15′ based on the angle of the image axis of the respective projected segmented image relative to the mesh surface.
Based on the confidence values 61-65 and the sets of single-image probability values 51-55 of each pixel of each image, the probability values are averaged to receive a set of overall probability values 50 for each pixel.
The sets of overall probability values 50 are then assigned to the pixels of the land-cover map(s) 20-24, which may be generated, optionally, based on a received orthoimage 10 of the area.
Although aspects are illustrated above, partly with reference to some preferred embodiments, it must be understood that numerous modifications and combinations of different features of the embodiments can be made. All of these modifications lie within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
22185206.4 | Jul 2022 | EP | regional |