The present disclosure relates to a computer implemented method for determining three-dimensional coordinates describing a scene, a computer program, a computer program product, a device for determining three-dimensional coordinates describing a scene and a display device for display of information relating to three-dimensional coordinates describing a scene. More specifically, the disclosure relates to a computer implemented method for determining three-dimensional coordinates describing a scene, a computer program, a computer program product, a device for determining three-dimensional coordinates describing a scene and a display device for display of information relating to three-dimensional coordinates describing a scene as defined in the introductory parts of the independent claims.
A fast growing market both in civilian and military business is geographical information systems. Knowledge about geographical conditions forms a fundamental decision support to companies, authorities and in the military. The geographical information can for example comprise digital maps having superposed information layers such as infrastructure, terrain type and different types of objects.
One object of the present invention is to improve obtaining of 3D information from two-dimensional images.
This object has been achieved by means of a computer-implemented method as defined in claim 1.
Thus, according to a first aspect there is provided a computer implemented method for determining 3D coordinates describing a scene, obtaining a first image covering the scene captured from a first viewpoint, the first image being associated to a first timing of capture, obtaining a second image covering the scene captured from a second viewpoint different than the first viewpoint, the second image being associated to a second timing of capture, the second timing being different than the first timing, obtaining at least one third image covering the scene captured from a third viewpoint, the third image being associated to a third timing of capture, the third timing being different than the first and second timings, wherein the scene is formed in an overlapping geographical area of the first, second and third images, and wherein the first, second and third images are co-registered. The method comprises further obtaining intrinsic and/or extrinsic parameter values for at least one imaging device used for capture of the first image from the first viewpoint, the second image from the second viewpoint and the third image from the third viewpoint, and determining the 3D coordinates describing the scene, the determination comprising performing bundle-adjustments based on pair-wise measurements on the first, second and third images so as to minimize a re-projection error of reconstructed 3D points in the overlapping geographical area of the first, second and third images, the performing of bundle adjustments comprising refining positions of the reconstructed 3D points and at least some of the obtained intrinsic and/or extrinsic parameter values, wherein time is used as a parameter in the determination of the 3D coordinates describing the scene, and wherein the re-projection error is minimized allowing a height to be locally time dependent in reconstruction of 3D points at least for a part of the overlapping geographical area.
Thus, time is used as a parameter in the determination of the 3D coordinates describing the scene. Thereby, a height of the reconstructed 3D points can be treated as being a variable at least for a part of the overlapping geographical area. By reconstructing 3D points as defined herein, a possible change between the images is an additional parameter taking into consideration in the optimization problem, i.e. it is assumed that the actual height may be different at at least a part of the first, second and third images.
Allowing a height difference when minimizing the re-projection error enables more correct height determinations. Further, parts of a scene having time dependent characteristics can be identified. Thus, different types of vegetation, such as forests, can be identified in an otherwise static scene. Classification of land types can then be made for such identifications.
Also, other types of changes can be identified by allowing the height to be locally time dependent. Sliding movements/changes such as earthquakes and shrinking/growing glaciers can be identified. The sliding movement is characteristically extending along a plane of the earth.
Further, when minimizing the re-projection error while using time as a parameter in the determination of the 3D coordinates describing the scene, the values of global parameters, characteristically at least some of the extrinsic parameters, can be refined. Thus, the herein described method can be used for improving the values of extrinsic and/or intrinsic parameter values for the at least one image capturing device used to capture the first, second and third images.
The extrinsic parameters of a camera depend on its location and orientation. The intrinsic parameters of a camera depend on how the camera captures the images. Parameters such as focal length, aperture, field-of-view, resolution, etc govern the intrinsic matrix of a camera model.
Characteristically, the first, second and third viewpoints need to be known to some extent. This means that (extrinsic parameter) values for imaging device coordinates and a pointing direction of the imaging device are associated to the respective image. However, these values, or a selected subset of these values, can be adjustable by the algorithm. Thus, the adjustable values can be adjusted during processing of minimizing the re-projection error.
The third viewpoint can be different from the first and second viewpoints.
The first, second and third images are characteristically two-dimensional images.
The first and/or the second image and/or the third image may be a satellite image.
The method is described in relation to a first, second and third image. However, in practice, more images may be used.
The re-projection error may be minimized allowing a local height difference representing an increased height between a first timing corresponding to capture of the first image and subsequent second and third timings corresponding to capture of the second image and the third image, respectively.
The method can then be used for identification of forest areas and/or determination of characteristics of such forest areas.
The method may further comprise obtaining a first point in time at which the first image was captured, obtaining a second point in time at which the second image was captured and obtaining a third point in time at which the third image was captured, wherein any reconstructed 3D point with an identified local height difference between the first, second and third points in time is also associated to information relating to the first, second and third points in time.
For example, a rate of change may then be determined based on the identified height difference between the first, second and third points in time and the information relating to the first, second and third points in time. The rate of change may be a growth value representative of a growth per time unit.
The method may further comprise a step of estimating a height value for a fourth point in time based on the determined rate of change and based on said first and/or second and/or third points in time.
The computer-implemented method may comprise indicating that there is a height uncertainty for 3D coordinates for which a height difference in the reconstructed 3D points has been identified or for which a height difference has been determined to lie outside an expected behaviour. This may occur for example if a change with time does not follow any expected behaviour, such as continuously increasing height or a discontinuous change such as occurrence/disappearance of objects such as buildings.
The computer-implemented method may comprise a step of refining at least one of the obtained intrinsic and/or extrinsic parameter values.
The computer-implemented method may comprise indicating a scene change for 3D coordinates for which a height difference in the reconstructed 3D points has been identified or for which a height difference having predetermined change characteristics has been identified or for which a difference in the xy plane has been identified.
According to some embodiments, the method further comprises determining whether an identified height difference between the first, second and third timings follows an expected behaviour, wherein the expected behaviour is comprised in the group comprising at least one of
According to some embodiments, the computer-implemented method comprises storing the determined 3D coordinates describing the scene, wherein the stored determined 3D coordinates may comprise
According to a second aspect there is provided a computer program comprising a program code for performing the computer-implemented method as disclosed herein.
According to a third aspect there is provided a computer program product comprising a program code stored on a computer readable media for performing the method as disclosed herein.
According to a fourth aspect there is provided a device for determining 3D coordinates describing a scene, the device comprising a memory arranged to store images and a processor arranged to obtain a first image covering the scene captured from a first viewpoint, the first image being associated to a first timing of capture, obtaining a second image covering the scene captured from a second viewpoint different than the first viewpoint, the second image being associated to a second timing of capture, the second timing being different than the first timing, obtaining least one third image covering the scene captured from a third viewpoint, the third image being associated to a third timing of capture, the third timing being different than the first and second timings, wherein the scene is formed in an overlapping geographical area of the first, second and third images, and wherein the first, second and third images are co-registered. The method comprises further obtain intrinsic and/or extrinsic parameter values for at least one imaging device used for capture of the first image from the first viewpoint, the second image from the second viewpoint and the third image from the third viewpoint, and determine the 3D coordinates describing the scene, the determination comprising performing bundle-adjustments based on pairwise measurements on the first, second and third images so as to minimize a re-projection error of reconstructed 3D points in the overlapping geographical area of the first, second and third images, the performing of bundle adjustments comprising refining positions of the reconstructed 3D points and at least some of the obtained intrinsic and/or extrinsic parameter values, wherein time is used as a parameter in the determination of the 3D coordinates describing the scene, and wherein the re-projection error is minimized allowing a height to be locally time dependent in reconstruction of 3D points at least for a part of the overlapping geographical area.
According to a fifth aspect there is provided a display device for display of information relating to 3D coordinates describing a scene, the display device comprising a display for display of the information and a memory storing data provided by a device for determining 3D coordinates describing a scene as disclosed herein.
Effects and features of the second through fifth aspects are to a large extent analogous to those described above in connection with the first aspect. Embodiments mentioned in relation to the first aspect are largely compatible with the second through fifth aspects.
The present disclosure will become apparent from the detailed description given below. The detailed description and specific examples disclose preferred embodiments of the disclosure by way of illustration only. Those skilled in the art understand from guidance in the detailed description that changes and modifications may be made within the scope of the disclosure.
Hence, it is to be understood that the herein disclosed disclosure is not limited to the particular component parts of the device described or steps of the methods described since such device and method may vary. It is also to be understood that the terminology used herein is for purpose of describing particular embodiments only, and is not intended to be limiting. It should be noted that, as used in the specification and the appended claim, the articles “a”, “an”, “the”, and “said” are intended to mean that there are one or more of the elements unless the context explicitly dictates otherwise. Thus, for example, reference to “a unit” or “the unit” may include several devices, and the like. Furthermore, the words “comprising”, “including”, “containing” and similar wordings does not exclude other elements or steps.
The above objects, as well as additional objects, features and advantages of the present disclosure, will be more fully appreciated by reference to the following illustrative and non-limiting detailed description of examples, when taken in conjunction with the accompanying drawings.
The present disclosure will now be described with reference to the accompanying drawings, in which preferred example embodiments of the disclosure are shown. The disclosure may, however, be embodied in other forms and should not be construed as limited to the herein disclosed examples. The disclosed examples are provided to fully convey the scope of the disclosure to the skilled person.
The computer-implemented method 100 comprises obtaining S1 a first image covering the scene captured from a first viewpoint, the first image being associated to a first timing of capture.
The computer-implemented method may comprise obtaining S2 a first point in time at which the first image was captured.
The computer-implemented method 100 further comprises obtaining S3 a second image covering the scene captured from a second viewpoint different than the first viewpoint, the second image being associated to a second timing of capture, the second timing being different than the first timing.
The computer-implemented method 100 may comprise obtaining S4 a second point in time at which the second image was captured.
The computer-implemented method 100 further comprises obtaining S5 at least one third image covering the scene captured from a third viewpoint, the third image being associated to a third timing of capture, the third timing being different than the first and second timings. The third viewpoint may be different than the first and second viewpoints.
The computer-implemented method 100 may comprise obtaining S6 a third point in time at which the third image was captured.
Thus, the scene for which 3D coordinates are to be described is formed in an overlapping geographical area of the first, second and third images. The first, second and third images are co-registered.
The obtained first, second and third images are characteristically two-dimensional images. The obtained first, second and third images are characteristically digital images comprising a plurality of pixels each having a pixel value. The obtained first image and/or the second image and/or the third image may be a satellite image.
The method further comprising obtaining S7 intrinsic and/or extrinsic parameter values for at least one imaging device used for capture of the first image from the first viewpoint, the second image from the second viewpoint and the third image from the third viewpoint.
The obtained intrinsic and/or extrinsic parameter values comprise at least viewpoint data, i.e. the respective image is associated to an imaging device coordinate value and a pointing direction value. In practice, this means that the first, second and third viewpoints are known to some extent. However, these values, or a selected subset of these values, can be adjustable. Thus, the adjustable values can be adjusted during a process of minimizing a re-projection error. Thus, if the viewpoint coordinate value and/or pointing direction value is uncertain, these values can be adjusted. This means that images from satellites where the viewpoint coordinate values and/or pointing direction values are uncertain, these satellite images can still be used in the method as discussed herein. The process of minimizing the re-projection error will be described more in detail below.
As the intrinsic and/or extrinsic parameter values including satellite viewpoints are generally known, this information can be used geo-reference the obtained images. This means that the obtained satellite images are each associated to coordinate data in a global coordinate system, wherein 2D positions on the ground are given as latitude and longitude or equivalents thereof. The coordinate data may be two-dimensional or three-dimensional.
The method further comprises determining S8 the 3D coordinates describing the scene. The determination comprises performing bundle-adjustments based on pair-wise measurements on the first, second and third images so as to minimize a re-projection error of reconstructed 3D points in the overlapping geographical area of the first, second and third images.
In photogrammetry and computer stereo vision, bundle adjustment refers to simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the imaging device(s) employed to acquire the images, given a set of images depicting a number of 3D points from different viewpoints. The name bundle adjustment refers to the geometrical bundles of light rays originating from each 3D feature and converging on each imaging device's optical center, which are adjusted optimally according to an optimality criterion involving the corresponding image projections of all points.
The performing of bundle adjustments as discussed herein may be made using methods known in the art and involving refining positions of the reconstructed 3D points and at least some of the obtained intrinsic and/or extrinsic parameter values to minimize the re-projection error of reconstructed 3D points. For example, the refining of positions of the intrinsic/and or extrinsic parameter values may involve refining coordinate values and/or pointing direction values for at least some of the images used.
However, in accordance with the solution as presented herein, time is used as a parameter in the determination of the 3D coordinates describing the scene. This means that the re-projection error is minimized allowing a height to be locally time dependent in reconstruction of 3D points at least for a part of the overlapping geographical area.
The term “locally time dependent” in the context of the present disclosure means that the height is time dependent in a one or a plurality of parts of an overlapping area for the first, second and third images. Thus, characteristically, the height is not time dependent in the whole overlapping area.
Even changes such as earthquakes and glaciers can be identified by allowing the height to be locally time dependent.
The re-projection error may be minimized allowing a height difference representing an increased height between a first timing corresponding to capture of the first image and subsequent second and third timings corresponding to capture of the second image and the third image, respectively.
Further, the computer-implemented method 100 may further comprise a step of determining (S9) a rate of change based on the identified height difference between the first, second and third points in time and the information relating to the first and second points in time. The rate of change may be a growth value representative of a growth per time unit.
The computer-implemented method 100 may further comprise a step of estimating S10 a height value for a fourth point in time based on the determined rate of change and based on said first and/or second and/or third points in time.
The computer-implemented method 100 may further comprise indicating S12 that there is a height uncertainty for 3D coordinates for which a height difference in the reconstructed 3D points has been identified.
The computer-implemented method may comprise a step of determining S14 whether an identified height difference is outside an expected behaviour, for example determining whether a change with time does not follow an expected behaviour, such as continuously increasing height or a discontinuous change such as occurrence/disappearance of objects such as buildings. When it has been determined that the identified height difference is outside an expected behaviour, it may be indicated S12 that there is a height uncertainty.
The computer-implemented method 100 may comprise determining S14 whether an identified height difference between the first, second and third timings follows an expected behaviour. The expected behaviour may be comprised in the group comprising at least one of
The computer-implemented method may further comprise refining S11 at least one of the obtained intrinsic and/or extrinsic parameter values. This has been discussed above.
The computer-implemented method may further comprise storing S15 the determined 3D coordinates describing the scene, wherein at least some of the stored determined 3D coordinates may comprise
The at least one memory 202 may store the program instructions. The at least one memory 202 may comprise an image library storing images such as satellite images for use in determination of the 3D coordinates describing a scene. The at least one memory may store the determined coordinates describing the scene. Especially, some of the stored determined 3D coordinates may comprise
The device 200 for determining 3D coordinates describing a scene may further comprise a communication interface 203. The device 200 may receive images for use in determining the 3D coordinates describing the scene via the communication interface 203. The received images may be stored in the image library or processed upon reception. Further, the device 200 may transmit via the communication interface 203 information relating to the results of the processing by the processor 201. The results may be obtained from the at least one memory 202.
The display device 300 comprises or is connected to at least one memory 302. The memory 302 stores data provided by a device 200 for determining 3D coordinates describing a scene for example as discussed in relation to
The display device 300 may further comprise a user input interface 303 such as a touch screen, keyboard, mouse, or the like for user input of a request for information.
Characteristically, a processor 301 controls retrieval of data from the memory 302, processing of user input request to find requested information relating to the determined 3D coordinates describing the scene.
The images are captured from different postures or viewpoints in relation to the scene 416. As stated above, different image capture devices may have captured the different images, or the same image capture device may have been used. In the latter case, the platform on which the image capture device is supported has moved between the different viewpoints.
The image capture device(s) at the respective posture or viewpoint 410, 411, 412 has each a field of view 413, 414, 415. The scene 416 covered by the images is a part of the geography within the field of view of the image capture device(s) from all three postures or viewpoints which is also visible in all three images. Thus, in the illustrated example, the part 417 is not part of the scene as this part is obscured and thus not visible from a first viewpoint 410 of the viewpoints 410, 411, 412 and partly obscured and thus partly not visible from a second viewpoint 411 of the viewpoints 401, 411, 412.
A first image has been obtained covering a scene captured from a first viewpoint. The first image is associated to a first timing of capture. Further, a second image covering the scene captured from a second viewpoint different than the first viewpoint has been obtained. The second image is associated to a second timing of capture, said second timing being different than the first timing. Further, at least one third image covering the scene captured from a third viewpoint has been obtained. The at least one third image is associated to a third timing of capture, said third timing being different than the first and second timings. The third viewpoint may be different than the first and second viewpoints.
The scene covered by the images is a part of the geography within the field of view of image capture device(s) from all three postures or viewpoints which is also visible in all three images. The scene further illustrated in relation to
The first, second and third images are co-registered. The first, second, and third images are preferably geo-referenced.
Each image comprises a plurality of points or pixels. A typical distance d between the points or pixels may be between 30 and 300 cm, for example 100-150 cm or 50-130 cm for satellite-based images. For airborne images, the scale may be different.
Each or a plurality of the points or pixel has a corresponding point or pixel in the other two images. It is assumed that this correspondence between the images has been established. The correspondence may have been pair-wise established. Thus, the correspondence between points/pixels between the first and second images have been established, the correspondence between points/pixels between the first and third images have been established and the correspondence between points/pixels between the second and third images have been established.
In the illustrated example,
The same correspondence may have been established for all of the points/pixels (denoted “x”) of the images, which points/pixels are within the scene or a subset of the points/pixels of the images within the scene.
For each point/pixel 520, 521, 522 in each of the images for which correspondence between the images has been established, the 3D coordinate describing the scene is determined. The determination comprising performing bundle-adjustments based on pair-wise measurements on the first, second and third images so as to minimize a re-projection error of reconstructed 3D points in the overlapping geographical area of the first, second and third images, said performing of bundle adjustments comprising refining positions of the reconstructed 3D points and at least intrinsic and/or extrinsic parameter values. Time is used as a parameter in the determination of the 3D coordinates describing the scene. The re-projection error is minimized allowing a height to be locally time dependent in reconstruction of 3D points at least for a part of the overlapping geographical area. The determined 3D points are characteristically georeferenced.
For some determined 3D points, the height may have been changed and for other 3D points the height may not have changed between capture of the images.
For a first part 530 of the landscape, a height of the reconstructed 3D points is varying with time. The variation with time can be the same within this first part or it can change with the location in this first part 530. When the height changes with time, it may for example change in a way which is characteristic for a growing forest. Thus, different types of vegetation, such as forests, can be identified or used as a feature in a Land Use/Land Cover, LULC, classifier. Classification of land types can then be made for such identifications.
For a second part 531 of the landscape, the reconstructed 3D points are characteristically not varying with time. Such parts of the geography are suitable for refinement of the values of values of global parameters, characteristically at least some of the extrinsic parameters of an image capturing device which has captured the image.
For a third part 532 of the landscape, a variation with time is characteristically not continuous. For example, a house may be present at one timing and the, at a next timing the house has been torn down. Or alternatively, the house is not there at one timing and at the next it has been built
Also, other types of changing areas can be identified, such as earthquakes and glaciers by allowing the height to be time dependent.
However, the method as disclosed herein defines that the height is allowed to be locally time dependent. This means that the method does not allow all parts of the landscape to have height changing with time.
Thus, different types of changes can be identified, and classification of land types can then be made for such identifications.
Which types of changes can be identified depends on the model used for the time dependency. For example, the time dependency can relate only to height or also to the x and y axes. Further, the time dependency can be modelled to
The various aspects, embodiments, implementations or features of the described embodiments can be used separately or in any combination. Various aspects of the described embodiments can be implemented by software, hardware or a combination of hardware and software. The methods and operations may be implemented in some embodiments in a computer readable medium, which is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer readable medium include read-only memory, random-access memory, CD-ROMs, DVDs, magnetic tape and optical data storage devices. The computer readable medium can also be distributed over network-coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
The person skilled in the art realizes that the present disclosure is not limited to the examples described above. The person skilled in the art further realizes that modifications and variations are possible within the scope of the appended claims. Additionally, variations to the disclosed examples can be understood and effected by the skilled person in practicing the claimed disclosure, from a study of the drawings, the disclosure, and the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
22204916.5 | Nov 2022 | EP | regional |