The present disclosure relates to a field of electronic information technologies, and particularly to a method and a device for data processing, and a storage medium.
Nowadays, geographic information systems are widely applied. There is a large amount of spatial geographic data, and a lot of duplicated data. Therefore, geographic data often needs to be thinned before used, and features of the geographic data are maintained to a certain extent.
A curve thinning algorithm (Douglas-Peucker) is usually adopted for thinning. Coordinate points included in the geographic data are three-dimensional data, which are manifested as a sharply fluctuating curve in a three-dimensional space. However, the fluctuating curve will be thinned as a straight line by using the existing technology, thereby losing elevation information in the geographic data, resulting in inaccurate geographic data after thinning, and further errors may occur when the geographic data is processed.
According to a first aspect, a method for data processing is provided in embodiments of the present disclosure. The method includes:
According to a second aspect, a device for data processing is provided in embodiments of the present disclosure. The device includes:
According to a third aspect, a non-transitory computer-readable storage medium having a computer program stored thereon is provided in embodiments of the present disclosure. The computer program is executed by a processor to implement a method for data processing, including:
The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments in conformity with embodiments of the present disclosure, and explain the principle of the present disclosure together with the specification.
In order to understand the above purpose, features and advantages of the present disclosure more clearly, embodiments of the present disclosure may be further described below. It should be noted that, embodiments of the present disclosure may be combined with features in the embodiments without conflict.
Numerous specific details are set forth in the following description to facilitate a thorough understanding of the present disclosure. However, the present disclosure may also be implemented in other different ways than those described herein; obviously, embodiments in the specification are only a part of embodiments of the present disclosure rather than all embodiments.
In the related art, in the field of a geographic information system (GIS), it is often necessary to thin geographic data. Since the obtained GIS data has a large error in an elevation direction, the GIS data are generally thinned by using a curve thinning algorithm in a two-dimensional space. However, data in many geographic coordinate systems is mostly curves in a three-dimensional space, which are projected as straight lines on an XOY plane, with sharp fluctuation on a Z axis. Therefore, the fluctuating curves will be pulled into straight lines by directly thinning the GIS data, which may easily result in loss of elevation information in GIS data.
Specifically,
At S210, a point sequence including a plurality of coordinate points is acquired.
It may be understood that a point sequence in a universal transverse Mercator (UTM) grid system may be acquired, and the point sequence includes a plurality of coordinate points. The coordinate points are three-dimensional data, and include latitude and longitude information and elevation information.
At S220, the point sequence is interpolated, and a multi-dimensional data structure tree is constructed by using the interpolated point sequence.
In an embodiment, interpolating the point sequence, and constructing the multi-dimensional data structure tree by using the interpolated point sequence includes: performing linear interpolation based on the coordinate points in the point sequence; and constructing the multi-dimensional data structure tree based on the coordinate points in the interpolated point sequence.
It may be understood that, on the basis of the above S210, the linear interpolation is performed based on the coordinate points in the obtained point sequence, and the multi-dimensional data structure tree (K-D tree) is constructed by using the coordinate points in the interpolated point sequence, and a number of coordinate points included in the interpolated point sequence is greater than that of an un-interpolated point sequence. The multi-dimensional data structure tree refers to a K-D tree (an abbreviation of a K-dimensional tree), which is a data structure that divides a K-dimensional data space. It is mainly applied in search of key data in a multi-dimensional space, and for example, a range search and a nearest neighbor search may be performed by using a multi-dimensional data structure tree. The K-D tree may be understood as a special case of a binary space partitioning tree.
At S230, a first point sequence is obtained by thinning the point sequence and interpolating the thinned point sequence.
It is understandable that, on the basis of the above S220, an original point sequence obtained at the beginning may be thinned by using a data thinning algorithm, and a linear interpolation may be performed on the thinned point sequence, in which the interpolation method used is not limited. However, the same interpolation method needs to be adopted to interpolate the thinned point sequence and the original point sequence to obtain the first point sequence processed with the thinning and interpolation. It may be understood that, the first point sequence is obtained after thinning and interpolation, and the multi-dimensional data structure tree is constructed by interpolating the original point sequence.
At S240, elevations of the first point sequence are obtained based on the multi-dimensional data structure tree and the first point sequence.
It may be understood that, on the basis of the above S220 and S230, elevations of all coordinate points in the first point sequence are obtained based on the multi-dimensional data structure tree and the first point sequence. The first point sequence includes coordinate points retained after the original point sequence is thinned and coordinate points generated after interpolation. The elevation refers to a height of a certain point relative to a datum plane.
A method for data processing is provided in the embodiments of the present disclosure. A point sequence including a plurality of coordinate points is acquired; the point sequence is interpolated, and a multi-dimensional data structure tree is constructed by using the interpolated point sequence; a first point sequence is obtained by thinning the point sequence and interpolating the thinned point sequence; and elevations corresponding to all coordinate points in the first point sequence are obtained based on the multi-dimensional data structure tree and the first point sequence. In the method for data processing, the point sequence is interpolated, and a multi-dimensional data structure tree is constructed by using the interpolated point sequence, and a first point sequence is obtained by thinning the original point sequence and interpolating the thinned point sequence; and elevation information of each coordinate point in the first point sequence is calculated based on the multi-dimensional data structure tree, to complete data thinning processing, which effectively reduces a storage space of data, ensures elevation information in the thinned point sequence to a maximum extent, and performs smooth processing on the elevation information, with a fast calculation speed and a high data processing efficiency.
On the basis of the above embodiments,
At S310, a Euclidean distance between any two coordinate points in the point sequence is calculated.
It may be understood that, calculating the Euclidean distance between the any two coordinate points in an original point sequence, may be calculating the Euclidean distance in a two-dimensional space between the any two coordinate points in the point sequence. A calculation equation of the Euclidean distance may be the following equation (1):
where, d represents the Euclidean distance, any two coordinate points in the point sequence are A(x1,y1,z1) and B(x2,y2,z2), and only the Euclidean distance in a two-dimensional space between the any two coordinate points in the point sequence is calculated, that is, the Euclidean distance between x and y.
At S320, an interpolation number is obtained based on the Euclidean distance and a preset density.
In an embodiment, obtaining the interpolation number based on the Euclidean distance and the preset density includes: obtaining an interpolation number of coordinate points to be interpolated in the point sequence based on a product of the Euclidean distance and the preset density.
It may be understood that, on the basis of S310, the interpolation number is obtained based on a product of the Euclidean distance between the any two coordinate points in the original point sequence and the preset density. The preset density may be an interpolation of 4 points per meter, and the interpolation number is an integer. The interpolation number refers to a number of coordinate points that may be interpolated in the any two coordinate points A(x1,y1,z1) and B(x2,y2,z2) in the original point sequence. A calculation equation of the interpolation number may be the following equation (2):
where, num is the interpolation number, d is the Euclidean distance between two coordinate points A(x1,y1,z1) and B(x2,y2,z2), and k is the preset density.
At S330, an interpolated point sequence is generated based on the interpolation number and the any two coordinate points in the point sequence.
In an embodiment, generating the interpolated point sequence based on the interpolation number and the any two coordinate points in the point sequence includes: obtaining coordinate points to be interpolated corresponding to the interpolation number based on a difference value between the any two coordinate points in the point sequence and the interpolation number; and generating the interpolated point sequence based on the coordinate points to be interpolated and the coordinate points in the point sequence.
It may be understood that, on the basis of the above S320, the difference value of the two-dimensional space between the any two coordinate points in the original point sequence is calculated, and each coordinate point to be interpolated is calculated based on the difference value and the interpolation number, the number of the coordinate points to be interpolated being same as the interpolation number, that is, each coordinate point to be interpolated in the interpolation number is calculated, and the calculated coordinate point to be interpolated may exist between the any two coordinate points in the original point sequence, or may be around the two coordinate points, and each coordinate point to be interpolated in the interpolation number and the corresponding original point sequence are stored to obtain the interpolated point sequence. A calculation equation of each coordinate point to be interpolated in the interpolation number, i.e., an equation for calculating the interpolation number of coordinate points, is as shown in an equation (3):
where, i represents an i-th point in an interpolation number num, x, y and z are calculated by the interpolation number and any two coordinate points A and B in the point sequence, and are three-dimensional data of the i-th coordinate point in the interpolation number, that is, each coordinate point to be interpolated is calculated, and the number of all calculated coordinate points to be interpolated is same as the interpolation number.
For example, two-dimensional spaces of A and B in the any two coordinate points A(x1,y1,z1) and B(x2,y2,z2) in the original point sequence are (1, 2, 3) and (3, 4, 5), the calculated Euclidean distance d is 2.828, and the interpolation number num is 8, that is, 8 coordinate points may be inserted in the original point sequence based on the original point sequences A and B, in which a first coordinate point of the 8 coordinate points is (0.2, 0.2, 0.2), and an eighth coordinate point is (1.8, 1.8, 1.8).
In the method for data processing provided in the embodiments of the disclosure, a Euclidean distance between the any two coordinate points in the point sequence is calculated, an interpolation number is obtained based on the Euclidean distance and a preset density, and an interpolated point sequence is generated based on the interpolation number and the any two coordinate points in the point sequence. Values of a plurality of approximate coordinate points around the any two coordinate points in the original point sequence may be obtained by a linear interpolation, which facilitates searching accurate coordinate points around the coordinate points.
On the basis of the above embodiments,
At S410, a median of the coordinate points in the interpolated point sequence is calculated.
It may be understood that, the median of two-dimensional spaces of the coordinate points in the point sequence obtained by interpolating the original point sequence is calculated. A calculation method of the median may be an equation (4):
where, j represents a dimension, and midj represents a median of all data in a j dimension.
It may be understood that, a median of two dimensions xoy in the original point sequence is calculated, with x as a first dimension, and y as a second dimension.
At S420, the multi-dimensional data structure tree is generated by dividing the coordinate points in the point sequence into two regions based on the median until it is unable to determine a region for the coordinate points in the point sequence.
It may be understood that, on the basis of the above S410, all coordinate points in the original point sequence are divided into two regions based on the calculated median mid1, and all coordinate points in the original point sequence that are smaller than the median are determined as a left region, the coordinate points that are greater than the median are determined as a right region. A median in the left region and a median in the right region continue to be calculated respectively, and the left region and the right region continue to be divided into two regions based on respective medians of the left region and the right region, in which case, an entire original point sequence is divided into four regions. Until it is unable to continue determining clear left and right regions for the coordinate points in the point sequence, the construction of the multi-dimensional data structure tree is completed.
For example, taking
In the method for data processing provided in the embodiments of the disclosure, a median of the coordinate points in the interpolated point sequence is calculated, the coordinate points in the point sequence are divided into two regions based on the median, and until it is unable to determine a region for the coordinate points in the point sequence, a multi-dimensional data structure tree is generated, which may reduce a calculation data volume and a data search time and accelerate data processing for the convenience of subsequent data search.
On the basis of the above embodiments,
At S610, a straight line between a minimum coordinate point and a maximum coordinate point in the point sequence is constructed.
It may be understood that, all coordinate points in the original point sequence are counted, a start coordinate point of a curve constituted by the coordinate points, that is, a minimum coordinate point, and an end coordinate point, that is, a maximum coordinate point, are determined, and a straight line is connected based on the minimum coordinate point and the maximum coordinate point.
At S620, distances between the coordinate points in the point sequence and the straight line are counted, and a maximum distance is determined.
It may be understood that, the distances between all the coordinate points in the point sequence and the straight line are calculated, the distances corresponding to all the coordinate points are counted, and the maximum distance is determined.
At S630, whether the maximum distance is greater than a preset threshold is determined, and if yes, a coordinate point corresponding to the maximum distance is retained; if not, a coordinate point corresponding to the straight line in the point sequence is retained.
It may be understood that, whether the maximum distance is greater than a preset threshold is determined, and if yes, the coordinate point corresponding to the maximum distance is retained, and a curve constituted by all coordinate points in the original point sequence are divided into two parts by taking the coordinate point as a boundary, and continue repeating the above operations; if not, all coordinates points not on the straight line in the original point sequence are deleted, and only coordinate points on the straight line are retained.
In the method for data processing provided in the embodiments of the disclosure, a straight line between a minimum coordinate point and a maximum coordinate point in the point sequence is constructed, distances between the coordinate points in the point sequence and the straight line are counted, and a maximum distance is determined; and whether the maximum distance is greater than a preset threshold is determined, and if yes, the coordinate point corresponding to the maximum distance is retained; if not, the coordinate point corresponding to the straight line in the point sequence is retained, which may thin a large number of coordinate points included in data and reduce a data volume.
On the basis of the above embodiments,
At S710, for any one coordinate point in the first point sequence, a preset number of coordinate points close to the coordinate point are determined in the multi-dimensional data structure tree.
It may be understood that, for any one coordinate point C(x0,y0,z0) in the first point sequence generated by thinning and interpolation, coordinate points of a preset number k close to C(x0,y0,z0) are determined in the multi-dimensional data structure tree obtained by interpolating the original point sequence according to the coordinate point C(x0,y0 z0) The method for interpolating the thinned point sequence is same as the method for directly interpolating the original point sequence, and the above linear interpolation method may be adopted.
At S720, weights corresponding to sub-coordinate points in the preset number of coordinate points are determined based on the coordinate point and the sub-coordinate points.
In an embodiment, determining the weights corresponding to the sub-coordinate points in the preset number of coordinate points based on the coordinate point and the sub-coordinate points includes: calculating difference values between the coordinate point and the sub-coordinate points in the preset number of coordinate points, and mapping the difference values to weights; and determining the weights corresponding to the sub-coordinate points by normalizing the weights.
It may be understood that, on the basis of the above S710, based on the coordinate points in the first sequence and the preset number of coordinate points corresponding to the coordinate points in the first sequence, sub-coordinate points in the preset number of coordinate points, for example, a value of the preset number may correspond to the above preset density. Preferably, when the preset number is determined as 4, then nearest 4 coordinates are found in the multi-dimensional data structure tree based on C(x0, y0,z0) in which each of the 4 coordinate points may be referred to as a sub-coordinate point; a difference value between C(x0,y0,z0) and a sub-coordinate point D(xk,yk,zk) may be calculated, and a Gaussian kernel function may be adopted to map a difference value between C and D to a weight. Taking an x dimension in the coordinate points C and D for example, a calculation equation of the Gaussian kernel function is the following equation (5):
where, w(xi) is a weight, i is a sequence number in a preset number k of coordinate points, σ is a bandwidth parameter, xi represents a value of a sub-coordinate point searched, and x0 represents a value of any one coordinate point corresponding to the sub-coordinate point in the first sequence.
It may be understood that, a weight corresponding to each sub-coordinate point is determined by normalizing the weights calculated for the preset number of sub-coordinate points and the any one coordinate point. The calculation equation of a normalization processing is the following equation (6):
where, represents a weight corresponding to the sub-coordinate point obtained by a normalization processing.
It may be understood that, in the above method for calculating the weight, the greater the difference value between the sub-coordinate point and the any one coordinate point in the first sequence, the smaller the weight. A Gaussian kernel function may be selected to map the difference value into a weight, and the weight decreases with the difference value. Farther points, i.e., sub-coordinate points corresponding to larger differences, have a smaller influence on the elevation of the coordinate point, and a value of the preset number k may further control a maximum difference value between the preset number of coordinate points searched from the K-D tree and the any one coordinate point in the first sequence, which may improve accuracy of calculating an elevation. Further, a bandwidth parameter of the Gaussian kernel function may be adjusted, thereby determining an influence degree of the sub-coordinate point on the elevation of the any one coordinate point.
At S730, an elevation of the coordinate point is obtained based on the weights corresponding to the sub-coordinate points and elevations corresponding to the sub-coordinate points.
In an embodiment, obtaining the elevation of the coordinate point based on the weights corresponding to the sub-coordinate points and the elevations corresponding to the sub-coordinate points includes: obtaining the elevation of the coordinate point by calculating products of the weights corresponding to the sub-coordinate points and the elevations corresponding to the sub-coordinate points and counting a sum of the products.
It may be understood that, on the basis of S710, the products of the weights corresponding to the sub-coordinate points and the elevations corresponding to the sub-coordinate points are calculated, and the products of all sub-coordinate points are summed to obtain the elevation of any one coordinate point in the first sequence. A calculation equation of the elevation may be the following equation (7):
where, w(xi) represents a weight corresponding to an i-th coordinate point in the preset number of coordinate points, and zi represents an elevation corresponding to an i-th coordinate point in the multi-dimensional data structure tree.
For example, taking the diagram of the result of data processing in
In the method for data processing provided in the embodiments of the disclosure, for any one coordinate point in the first point sequence, a preset number of coordinate points close to the coordinate point are determined in the multi-dimensional data structure tree, weights corresponding to sub-coordinate points in the preset number of coordinate points are determined based on the coordinate point and the sub-coordinate points, and an elevation of the coordinate point is obtained based on the weights corresponding to the sub-coordinate points and elevations corresponding to the sub-coordinate points, which may calculate elevations of all coordinate points in the first point sequence generated after being processed with thinning and interpolation, thereby retaining features of data that are elevation information in the data to a maximum extent while reducing a data volume, and smoothing the calculated elevation, with simple calculation, a rapid processing speed and high accuracy.
Embodiments of the disclosure further include the following application scenario. In
In an embodiment of a feasible application scenario, the terminal 11 may execute S210 and S220 in
In an embodiment of another feasible application scenario, the terminal 11 may execute S210, S220 and S230 in
It may be understood that the methods for constructing the multi-dimensional data structure, generating the first point sequence and obtaining the elevations of the first point sequence specifically included in the embodiments of the above two feasible application scenarios are same as the method for data processing only by using any one device in the terminal 11 or the server 12 in the above embodiments, which will not be repeated here.
Understandably, other scenarios of data transmission may further be included between the terminal 11 and the server 12, which will not be repeated here.
In an application scenario where there is an interaction operation in the data processing process between the terminal 11 and the server 12 provided in embodiments of the present disclosure, for the processed massive data, a processing speed can be accelerated, a processing memory can be reduced, hardware resources can be utilized to a maximum extent, and accuracy of data processing can be ensured. In the process of data transmission, data can further be backed up by the terminal 11 and the server 12 to ensure security of processing data.
In an embodiment, the first processing module 920 interpolates the point sequence, and constructs the multi-dimensional data structure tree by using the interpolated point sequence, and is specifically configured to:
In an embodiment, the first processing module 920 performs the linear interpolation based on the coordinate points in the point sequence, and is specifically configured to:
In an embodiment, the first processing module 920 obtains the interpolation number based on the Euclidean distance and the preset density, and is specifically configured to:
In an embodiment, the first processing module 920 generates the interpolated point sequence based on the interpolation number and the any two coordinate points in the point sequence, and is specifically configured to:
In an embodiment, the first processing module 920 constructs the multi-dimensional data structure tree based on the coordinate points in the interpolated point sequence, and is specifically configured to:
In an embodiment, the second processing module 930 thins the point sequence, and is specifically configured to:
In an embodiment, the calculation module 940 obtains the elevations of the first point sequence based on the multi-dimensional data structure tree and the first point sequence, and is specifically configured to:
In an embodiment, the calculation module 940 determines the weights corresponding to the sub-coordinate points in the preset number of coordinate points based on the coordinate point and the sub-coordinate points, and is specifically configured to:
In an embodiment, the calculation module 940 obtains the elevation of the coordinate point based on the weights corresponding to the sub-coordinate points and the elevations corresponding to the sub-coordinate points, and is specifically configured to:
The apparatus for data processing in the embodiment of
In addition, a computer-readable storage medium having a computer program stored thereon is further provided in embodiments of the present disclosure. The computer program is executed by a processor to implement steps of the method for data processing.
In addition, a computer program product is further provided in embodiments of the disclosure. The computer program product includes a computer program or an instruction. The computer program or instruction is configured to implement the method for data processing provided in the above any embodiment when executed by a processor.
It should be noted that relational terms such as “first” and “second” are used herein to distinguish one entity or operation from another entity or operation without necessarily requiring or implying any such actual relationship or order between such entities or operations. And, the terms “comprise”, “include” or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process, a method, an article or a device including a series of elements not only includes those elements but also includes other elements not expressly listed, or may further include elements inherent to such process, method, article, or device. In the absence of more constraints, the elements defined by a sentence “comprise a” do not preclude the presence of additional same elements in the process, method, article, or apparatus that includes the elements.
The foregoing is merely a specific embodiment of the present disclosure, so that those skilled in the art may understand or implement the present disclosure. Various modifications to the embodiments will be apparent to those skilled in the art, and general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the disclosure. Therefore, the present disclosure may not be limited to the embodiments described herein, but conform to the widest scope consistent with principles and novel features disclosed herein.
Number | Date | Country | Kind |
---|---|---|---|
202110536213.6 | May 2021 | CN | national |
This application is the US national phase application of International Application No. PCT/CN2022/077533, filed on Feb. 23, 2022, the entire contents of which are incorporated herein by reference for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/077533 | 2/23/2022 | WO |