The present disclosure relates to the field of data processing in general, and to a solution for predicting a forest stand target attribute.
The prediction of forest attributes on a large scale is an important aspect in managing forest stands. The current focus of so-called national forest inventories (NFI) are the volume by tree species, the total volume/biomass and the average dimensions (height, diameter) of trees.
One method used in existing inventories are interpolations of field measurements in sample plots using the K-Nearest-Neighbor (kNN) method applied to satellite images or airborne laser scans (ALS). Some inventories are augmented with mathematical growth models to adjust the volume increase since the ALS measurement.
Forest owners and the wood processing industry have a natural interest to know the quantitative and qualitative attributes of standing trees in the forest stands they own or intend to purchase. However, it is very difficult and expensive to measure these attributes for large forest areas manually, or even with the support of drones.
For this reason, estimations are commonly used instead to cover large forest areas. However, established methods for estimations are often inaccurate, incomplete, or do not include certain attributes of interest. Established methods, for example, the ALS, are good for estimating height of trees over large areas but are not so useful for detecting tree species. Likewise, estimations on satellite images alone suffer from a lack of spatial resolution.
Therefore, there is still a need for a solution that enables a more accurate estimation or prediction of characteristics of a forest stand or stands.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
It is an object of the present disclosure to provide a technical solution for enabling predictions of one or more forest stand target attributes for one or more forest stands.
The object above is achieved by the features of the independent claims in the appended claims. Further embodiments and examples are apparent from the dependent claims, the detailed description and the accompanying drawings.
According to a first aspect, there is provided a method for building a model for a forest stand target attribute. The method comprises obtaining direct indicator data about forest stands; obtaining indirect indicator data about the forest stands; obtaining empirical measurement data about the forest stands; dividing the forest stands into a grid composed of geographically non-overlapping cells, the grid comprising a plurality of grid layers; determining values of a forest stand target attribute for a first set of cells of a grid layer based on the empirical measurement data; determining values of a plurality of input variables for a second set of cells of the remaining grid layers based on the direct indicator data and the indirect indicator data so that cells of each remaining grid layer comprise values associated with the same input variable, the second set of cells geographically corresponding to the first set of cells; converting the grid layers to grid-specific feature vectors so that each grid-specific feature vector corresponds to a single cell of the grid; and applying a training algorithm for the forest stand target attribute to generate a trained model for the forest stand target attribute based on the grid-specific feature vectors. This enables generation of a model for the forest stand attribute covering a large geographical area based on a limited set of initial forest stand target attribute values.
According to a second aspect, there is provided a method for predicting a forest stand target attribute. The method comprises obtaining direct indicator data about forest stands, the direct indicator data comprising imaging data, scanning data and/or measurement data about the forest stands; obtaining indirect indicator data about the forest stands, the indirect indicator data comprising data associated with growth of wood in forest stands; obtaining empirical measurement data about the forest stands, the empirical measurement data being obtained from at least one source processing wood and/or harvesting wood; dividing the forest stands into a grid composed of geographically non-overlapping cells, the grid comprising a plurality of grid layers; determining values of a forest stand target attribute for a first set of cells of a grid layer based on the empirical measurement data; determining values of a plurality of input variables for a second set of cells of the remaining grid layers based on the direct indicator data and the indirect indicator data so that cells of each remaining grid layer comprise values associated with the corresponding same input variable, the second set of cells geographically corresponding to the first set of cells; converting the grid layers to grid-specific feature vectors so that each grid-specific feature vector corresponds to a single cell of the grid; applying a training algorithm for the forest stand target attribute to generate a trained model for the forest stand target attribute based on the grid-specific feature vectors; determining values of the plurality of input variables for a given cell of the remaining grid layers based on the direct indicator data and the indirect indicator data; constructing an input feature vector for the given cell based on the values of the plurality of input variables for the given cell; and predicting the value of the forest stand target attribute for the given cell based on the input feature vector and the trained model for the forest stand target attribute.
In an implementation form of the first aspect, the method further comprises determining values of the plurality of input variables for a given cell of the remaining grid layers based on the direct indicator data and the indirect indicator data; constructing an input feature vector for the given cell based on the values of the plurality of input variables for the given cell; and predicting the value of the forest stand target attribute for the given cell based on the input feature vector and the trained model for the forest stand target attribute. This enables users to get forest stand target attribute estimates for a particular location, and offers the flexibility of aggregating such estimates to regions of arbitrary size and shape. Good estimations of forest stand attributes may provide decision support, for example, for planning harvest operations, selling/buying forest assets, and selling/buying wood from forest stands.
In an implementation form of the first or second aspect, the method further comprises predicting the value of the forest stand target attribute for each cell of a forest stand; and calculating a forest stand-level value of the forest stand target attribute based on the values of the forest stand target attribute for all cells of the forest stand. This enables users to get forest attribute estimates for a particular stand or group of stands, and to locate stands matching a wide range of search criteria (for example, geographic region, quantity per species, quality parameters, growth rate or any combination of these).
In an implementation form of the first or second aspect, a cell comprises a plurality of sub-cells, and the method further comprises calculating a value associated with an input variable for a cell based on values associated with an input variable for the plurality of sub-cells of the cell. This enables determining a single value of an input variable for a cell based on a set of values associated with the sub-cells, so that input data with a higher spatial resolution (such as remote sensing data) can be combined with low-resolution input variables to produce more accurate estimations.
In a further implementation form of the first or second aspect, calculating the value associated with the input variable by using a convolutional neural network, statistical aggregation or filters combined with aggregation. By using, for example, a convolutional neural network higher overall accuracy at the cost of additional computation time may be provided.
In a further implementation form of the first or second aspect, the indirect indicator data comprises time series for each of N input variables for a cell, and the method further comprises: calculating an optimal aggregation function for computing a single derived input variable value from a subset of up to N input variables from time series of these input variables for each cell, so that the aggregation function maximizes a correlation between the single derived input variable and the forest stand target attribute; and applying the aggregation function to all cells of the grid for computing a derived input grid layer. This enables assigning specific values of an input variable for a cell of a grid layer even if the input variable is a time series input variable.
In a further implementation form of the first or second aspect, the method further comprises transforming forest stand level empirical measurement data to grid-level estimates of the forest stand level empirical measurement data. This enables determining values of an input variable for individual cells even if the forest stand level empirical measurement data covers a large geographical area.
In a further implementation form of the first or second aspect, the method further comprises attributing empirical measurement data associated with a specific geographical location to a respective cell covering the specific geographical location. This enables determining values of an input variable for a cell where empirical measurement data is available for a very specific geographical location, thereby increasing the correlation with other input variables for this cell and hence the overall accuracy of the model and predictions.
In a further implementation form of the first or second aspect, the method further comprises predicting, for the forest stands, values of at least one forest stand target attributes based on the trained models for the forest stand target attributes; and applying at least one search criterion to find at least one forest stand matching the at least one search criterion. This enables a solution with which it is possible to locate forest stands matching a wide range of search criteria (for example, geographic region, quantity per species, quality parameters, growth rate or any combination of these).
In a further implementation form of the first or second aspect, the direct indicator data comprises at least one of forest inventory estimates, airborne laser scan data, field measurement data, optical, hyperspectral or radar satellite data, and aerial image data. The direct indicator data provides input variables for estimating the inventory of a forest stand and enables prediction of species distribution, total volume/biomass, and log dimensions.
In a further implementation form of the first or second aspect, the indirect indicator data comprises at least one of silvicultural data, geographical data, geological data, historical weather and climate data. The indirect indicator data provides input variables for estimating wood quality and enables prediction of wood class and saw log quality. The indirect indicator data also provides input variables for estimating the species distribution and growth rate and therefore enables more accurate forest inventory estimates, especially when the age of the forest stand is known.
In a further implementation form of the first or second aspect, the empirical measurement data comprises at least one of harvester machine data, X-ray data, saw mill data, pulp mill data and integrated mills data. The use of empirical measurement data collected, for example, automatically during harvest operations helps to save costs compared to the labor-intensive method of field measurements in forests. The use of quality measurements from wood-processing mills enables the cost-efficient prediction of wood quality attributes, which would be more difficult, inaccurate and expensive when done via field measurements.
In a further implementation form of the first or second aspect, the forest stand target attribute comprises one of: distribution of tree species, distribution of wood classes, distribution of log classes, sawlog quality, pulp wood quality, forest growth rate, volume per hectare, basal area, average diameter, average height, average diameter at breast height, average volume per stem, number of stems per hectare, recommended harvest operation, risk of forest damages by fire, risk of forest damages by storm, and risk of forest damages by pests. Knowledge of these forest stand attributes provides guidance for buyers of standing stock and enables objective valuation of forest stands and their wood inventory.
According to a third aspect, there is provided a system for building a model for a forest stand target attribute. The system comprises at least one processing unit and at least one memory. The at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to obtain direct indicator data about forest stands; obtain indirect indicator data about the forest stands; obtain empirical measurement data about the forest stands; divide the forest stands into a grid composed of geographically non-overlapping cells, the grid comprising a plurality of grid layers; determine values of a forest stand target attribute for a first set of cells of a grid layer based on the empirical measurement data; determine values of a plurality of input variables for a second set of cells the remaining grid layers based on the direct indicator data and the indirect indicator data so that cells of each remaining grid layer comprise values associated with the same input variable, the second set of cells geographically corresponding to the first set of cells; convert the grid layers to grid-specific feature vectors so that each grid-specific feature vector corresponds to a single cell of the grid; and apply a training algorithm for the forest stand target attribute to generate a trained model for the forest stand target attribute based on the grid-specific feature vectors.
According to a fourth aspect, there is provided a system for predicting a forest stand target attribute. The system comprises at least one processing unit and at least one memory. The at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to obtain direct indicator data about forest stands, the direct indicator data comprising imaging data, scanning data and/or measurement data about the forest stands; obtain indirect indicator data about the forest stands, the indirect indicator data comprising data associated with growth of wood in forest stands; obtain empirical measurement data about the forest stands, the empirical measurement data being obtained from at least one source processing wood and/or harvesting wood; divide the forest stands into a grid composed of geographically non-overlapping cells, the grid comprising a plurality of grid layers; determine values of a forest stand target attribute for a first set of cells of a grid layer based on the empirical measurement data; determine values of a plurality of input variables for a second set of cells of the remaining grid layers based on the direct indicator data and the indirect indicator data so that cells of each remaining grid layer comprise values associated with the corresponding same input variable, the second set of cells geographically corresponding to the first set of cells; convert the grid layers to grid-specific feature vectors so that each grid-specific feature vector corresponds to a single cell of the grid; apply a training algorithm for the forest stand target attribute to generate a trained model for the forest stand target attribute based on the grid-specific feature vectors; determining values of the plurality of input variables for a given cell of the remaining grid layers based on the direct indicator data and the indirect indicator data; construct an input feature vector for the given cell based on the values of the plurality of input variables for the given cell; and predict the value of the forest stand target attribute for the given cell based on the input feature vector and the trained model for the forest stand target attribute.
In an implementation form of the third aspect, the at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to determine values of the plurality of input variables for a given cell of the remaining grid layers based on the direct indicator data and the indirect indicator data; construct an input feature vector for the given cell based on the values of the plurality of input variables for the given cell; and predict the value of the forest stand target attribute for the given cell based on the input feature vector and the trained model for the forest stand target attribute.
In an implementation form of the third or fourth aspect, the at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to predict the value of the forest stand target attribute for each cell of a forest stand; and calculate a forest stand-level value of the forest stand target attribute based on the values of the forest stand target attribute for all cells of the forest stand.
In an implementation form of the third or fourth aspect, a cell comprises a plurality of sub-cells, and wherein the at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to calculate a value associated with an input variable for a cell based on values associated with an input variable for the plurality of sub-cells of the cell.
In a further implementation form of the third or fourth aspect, calculating the value associated with the input variable by using a convolutional neural network, statistical aggregation or filters combined with aggregation.
In a further implementation form of the third or fourth aspect, wherein the indirect indicator data comprises time series for each of N input variables for a cell, and wherein the at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to: calculate an optimal aggregation function for computing a single derived input variable value from a subset of up to N input variables from time series of these input variables for each cell, so that the aggregation function maximizes a correlation between the single derived input variable and the forest stand target attribute; and apply the aggregation function to all cells of the grid for computing a derived input grid layer.
In a further implementation form of the third or fourth aspect, the at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to transform forest stand level empirical measurement data to grid-level estimates of the forest stand level empirical measurement data.
In a further implementation form of the third or fourth aspect, the at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to attribute empirical measurement data associated with a specific geographical location to a respective cell covering the specific geographical location.
In a further implementation form of the third or fourth aspect, the at least one memory stores program instructions that, when executed by the at least one processing unit, cause the system to predict, for the forest stands, values of at least one forest stand target attributes based on the trained models for the forest stand target attributes; and apply at least one search criterion to find at least one forest stand matching the at least one search criterion.
In a further implementation form of the third or fourth aspect, the direct indicator data comprises at least one of forest inventory estimates, airborne laser scan data, field measurement data, optical, hyperspectral or radar satellite data, and aerial image data.
In a further implementation form of the third or fourth aspect, the indirect indicator data comprises at least one of silvicultural data, forest inventory data, geographical data, geological data, historical weather and climate data.
In a further implementation form of the third or fourth aspect, the empirical measurement data comprises at least one of harvester machine data, X-ray data, saw mill data, pulp mill data and integrated mills data.
In a further implementation form of the third or fourth aspect, the forest stand target attribute comprises one of: distribution of tree species, distribution of wood classes, distribution of log classes, sawlog quality, pulp wood quality, forest growth rate, volume per hectare, basal area, average diameter, average diameter at breast height, average height, average volume per stem, number of stems per hectare, recommended harvest operation, risk of forest damages by fire, risk of forest damages by storm, and risk of forest damages by pests.
According to a fifth aspect, there is provided a computer program comprising program code which, when executed by at least one processor, performs the method of the first or second aspect.
According to a sixth aspect, there is provided a computer-readable medium comprising a computer program comprising program code which, when executed by at least one processor, performs the method of the first or second aspect.
According to a seventh aspect, there is provided a system for building a model for a forest stand target attribute. The system comprises means for performing: obtaining direct indicator data about forest stands; obtaining indirect indicator data about the forest stands; means for obtaining empirical measurement data about the forest stands; dividing the forest stands into a grid composed of geographically non-overlapping cells, the grid comprising a plurality of grid layers; determining values of a forest stand target attribute for a first set of cells of a grid layer based on the empirical measurement data; determining values of a plurality of input variables for a second set of cells the remaining grid layers based on the direct indicator data and the indirect indicator data so that cells of each remaining grid layer comprise values associated with the same input variable, the second set of cells geographically corresponding to the first set of cells; converting the grid layers to grid-specific feature vectors so that each grid-specific feature vector corresponds to a single cell of the grid; and applying a training algorithm for the forest stand target attribute to generate a trained model for the forest stand target attribute based on the grid-specific feature vectors.
According to an eighth aspect, there is provided a system for predicting a forest stand target attribute. The system comprises means for performing: obtaining direct indicator data about forest stands, the direct indicator data comprising imaging data, scanning data and/or measurement data about the forest stands; obtaining indirect indicator data about the forest stands, the indirect indicator data comprising data associated with growth of wood in forest stands; obtaining empirical measurement data about the forest stands, the empirical measurement data being obtained from at least one source processing wood and/or harvesting wood; dividing the forest stands into a grid composed of geographically non-overlapping cells, the grid comprising a plurality of grid layers; determining values of a forest stand target attribute for a first set of cells of a grid layer based on the empirical measurement data; determining values of a plurality of input variables for a second set of cells of the remaining grid layers based on the direct indicator data and the indirect indicator data so that cells of each remaining grid layer comprise values associated with the corresponding same input variable, the second set of cells geographically corresponding to the first set of cells; converting the grid layers to grid-specific feature vectors so that each grid-specific feature vector corresponds to a single cell of the grid; applying a training algorithm for the forest stand target attribute to generate a trained model for the forest stand target attribute based on the grid-specific feature vectors; determining values of the plurality of input variables for a given cell of the remaining grid layers based on the direct indicator data and the indirect indicator data; constructing an input feature vector for the given cell based on the values of the plurality of input variables for the given cell; and predicting the value of the forest stand target attribute for the given cell based on the input feature vector and the trained model for the forest stand target attribute.
In a further implementation form of the seventh or eighth aspect, the means comprises at least one processor and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the performance of the system.
Other features and advantages of the present invention will be apparent upon reading the following detailed description and reviewing the accompanying drawings.
The essence of the present invention is explained below with reference to the accompanying drawings in which:
In the following description, references are made to the accompanying drawings, which form part of the present disclosure, and in which are shown, by way of illustration, specific aspects, embodiments and examples in which the present disclosure may be placed. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the present disclosure is defined by the appended claims. Further, the present disclosure can be embodied in many other forms and should not be construed as limited to any certain structure or function disclosed in the following description.
According to the detailed description, it will be apparent to ones skilled in the art that the scope of the present disclosure covers any embodiment of the present invention, which is disclosed herein, irrespective of whether this embodiment is implemented independently or in concert with any other embodiment of the present disclosure. For example, the apparatus and method disclosed herein can be implemented in practice by using any numbers of the embodiments provided herein. Furthermore, it should be understood that any embodiment of the present disclosure can be implemented using one or more of the elements presented in the appended claims.
As used herein, the term “forest stand” may refer to a geographically restricted area that is governed and/or owned by a specific entity. A plurality of forest stands may be geographically close to each other, or alternatively, they may be a distributed in multiple geographically separate locations.
As used herein, the term “forest stand target attribute” may refer to any attribute that is measurable for a forest stand and that somehow characterizes the forest stand. For example, a forest stand target attribute may determine a distribution of tree species in the forest stand, a distribution of wood classes (for example, log wood, pulp wood, energy wood etc.), a distribution of log classes, sawlog quality (for example, in terms of knots and/or branches), pulp wood quality, forest growth rate, volume per hectare, basal area, average diameter, average diameter at breast height, average height, average volume per stem, number of stems per hectare, recommended harvest operation (such as first or subsequent thinning, or regenerative felling, for example, according to national forest management guidelines), risk of forest damages by fire, risk of forest damages by storm, and risk of forest damages by pests etc.
As used herein, the term “input variable” may refer to any variable that can be measured about one or more forest stands or somehow affects to the development of trees in one or more forest stands. Input variables may be determined, for example, based on at least one of optical image data, small aperture radar data, airborne laser scanning data, satellite image data, silvicultural data, forest inventory data, geographical data, geological data, historic weather data, historic climate data etc.
As used herein, the term “grid” may refer to a structure composed of geographically non-overlapping cells. In other words, a geographical area can be divided into a plurality of geographically non-overlapping cells, and the cells together constitute the grid.
As used herein, the term “grid layer” may refer to a sub-part associated with the grid. A plurality of grid layers may be associated with the grid. Each grid layer associated with the grid comprises or covers the same set of geographically non-overlapping cells.
Forest owners and the wood processing industry have a natural interest to know the quantitative and qualitative attributes of standing trees in the forest stands they own or intend to purchase. However, it is very difficult and expensive to measure these attributes for large forest areas manually, or even with the support of drones. The present disclosure provides a solution for training a model for predicting a forest stand target attribute and for predicting the forest stand target attribute. The solution uses direct indicator data about forest stands, indirect indicator data about the forest stands and empirical measurement data about the forest stands to build a trained model for the forest stand target attribute. Using the trained model, it is possible to predict, for a given forest stand, the value of the forest stand target attribute.
At 100, direct indicator data about forest stands is obtained. The direct indication data may refer to imaging data, scanning data and/or measurement data that is available about the forest stands. The term “imaging data”, “scanning data” and/or “measurement data” is to be understood widely to refer to any data representing or originating from measurements of standing trees. The direct indication data may comprise, for example, at least one of aerial image data, small aperture radar data, airborne laser scanning data, satellite image data etc.
At 102, indirect indicator data about the forest stands is obtained. The indirect indication data may refer to data that helps to explain growth of trees in the forest stands and/or to data associated with growth of wood in forest stands. The indirect indication data may comprise, for example, at least one of silvicultural data, geographical data, geological data, historic weather data, historic climate data etc. The geographical data may refer, for example, to at least one of geographic location data, altitude data, steepness data and direction of a terrain slope. The geological data may refer, for example, to soil type, soil thickness, water storage capacity and concentration of plant nutrients.
At 104, empirical measurement data about the forest stands is obtained. The empirical data may refer to data obtained from at least one source processing wood and/or harvesting wood. The empirical forest data may refer, for example, to data obtained from harvest operations and mills. The empirical forest data may comprise, for example, measurement data from harvester machines, measurement data from log-sorting machines in saw mills, X-ray data from saw mills, measurement data from pulp mills and integrated mills etc.
At 106 the forest stands are divided into a grid composed of geographically non-overlapping cells. In an example, the forest stands may geographically cover a whole country or only certain parts of the country. Further, the forest stands may be governed or owned by one or more entities. Further, the grid comprises a plurality of grid layers. Each grid layer covers the same set of geographically non-overlapping cells.
At 108 values of a forest stand target attribute are determined for a first set of cells of a grid layer based on the empirical measurement data. Each cell of the first set of cells of the grid layer has a specific value of the forest stand target attribute. The forest stand target attribute may refer, for example, to at least one of distribution of tree species, distribution of wood classes, distribution of log classes, sawlog quality, pulp wood quality, forest growth rate, volume per hectare, basal area, average diameter, average diameter at breast height, average height, average volume per stem, number of stems per hectare, recommended harvest operation, risk of forest damages by fire, risk of forest damages by storm, and risk of forest damages by pests. In an embodiment, the first set of cells associated with the grid layer do not comprise all cells of the grid layer. In other words, the first set of cells comprise only a subset of cells of the grid layer. Values of the forest stand target attribute are available only for some cells, i.e. the first set of cells, of all cells of the grid layer. In an embodiment, forest stand target attribute values are available only for a subset of all cells of the grid layer, and in some embodiments, only for a small or a very small subset of all cells of the grid layer. In an embodiment, values of the forest stand target attribute statistically represent the whole geographical area covered by the grid. In some embodiments, the order of the “small subset” may be in the range of 0.01% (1 out of 10.000) down to 0.0001% (1 out of 1000.000) depending on the geographic independence of the cells. Typically, the forest stand target attribute cells come in clusters (one cluster per measured forest stand), and from a statistical perspective a large cluster does not contain much more information than a small cluster due to the similarity of the cells within the cluster.
At 110 values of a plurality of input variables are determined for a second set of cells of the remaining grid layers based on the direct indicator data and the indirect indicator data so that cells of each remaining grid layer comprise values associated with the same input variable. The second set of cells geographically correspond to the first set of cells. In other words, each grid layer may comprise only values associated with a specific input variable. In an embodiment, although values of the plurality of input variables may be available for all or almost all cells of the remaining grid layers, only values of the plurality of input variables are used that relate to cells corresponding to cells that have values of the target forest stand attribute.
At 112 the grid layers are converted to grid-specific feature vectors so that each grid-specific feature vector corresponds to a single cell of the grid. In an embodiment, each grid-specific feature vector comprises scalar values corresponding to the single cell.
At 114 a training algorithm is applied for the forest stand target attribute to generate a trained model for the forest stand target attribute based on the grid-specific feature vectors. The training algorithm enables finding the most accurate and most general approximative function (trained model) for computing forest stand target attribute from the plurality of input variables at grid cell level. The term “training algorithm” generally refers to any supervised machine learning algorithm that can be used to generate the trained model. Similarly, the term “trained model” generally refers to machine learning model produced with the machine learning algorithm. In an embodiment, the machine learning algorithms used for this purpose are preferably regression algorithms such as an error-minimizing, non-linear machine learning algorithm, such as an Artificial Neural Network, Decision Tree, Random Forest, or Gradient Boosted Trees, or any algorithm which can handle hundred and more, potentially collinear, input variables. The regression algorithms minimize the estimation error of the approximative function (model) by iteratively estimating the forest stand target attribute based on the input variables within the grid-specific feature-vectors and adjusting the model parameters depending on the algorithm and its chosen hyperparameters. Each training cycle is repeated for different train/test splits of the available feature vectors to ensure the model is able to generalize sufficiently for previously unseen data (cross-validation). By systematically repeating the learning process with different hyperparameters and potentially different algorithms, the model accuracy is further improved until an optimal model has been found.
The solution disclosed above in
Arrows between the data entities exemplify possible relations between the data entities. For example, each data entity 224, 226, 228 has an effect to each empirical data while log quality 220 does not have a relation to the direct indicator data 200. As another example, the airborne laser scanning 210 enables determination of the biomass volume 216 and log dimension 218 but not the tree species 214.
In
The shared geo data layer 306 comprises geo data importers 322. geo data and historic weather and climate data is imported to the forest stand target attribute prediction system 300 with the geo data importers 322. A geo data integrator 320 processes the imported data to a form that can later be used by a training system 312.
The forest data layer 304 comprises forest data importers 316. Data 324 from a forest owner can be imported to the forest stand target attribute prediction system 300 with the forest data importers 316. The data 324 may comprise harvester files, stand geometry data, silvicultural data and inventory data. A forest data integrator 318 processes the imported data to a form that can later be used by the training system 312.
The prediction layer 302 comprises an image cache 304 that receives data from various image sources 326, for example, satellite image data, aerial image data, airborne laser scanning data etc. The training system 312 receives data from the geo data integrator 320, the forest data integrator 318 and the image cache 314. This data is used to train machine-learning based algorithms to enable prediction of at least one forest stand target attribute and to provide trained models for forest stand target attributes. A prediction system 310 then uses the trained models for making predictions, for example, for a forest stakeholder 308 having a material interest in the forest stand attributes. As an example, the forest owner may want to determine an ideal forest stand or forest stands based on specific forest stand target attributes. For example, the forest owner may want to cut only spruce trees with a specific amount and with a specific log dimension and log quality parameters within 150 km from a specific saw. By using the prediction system 310, the forest owner 308 is able to determine which forest stands alone or together fulfill these parameters.
In an embodiment, values for cells 402A1,1, 402A1,2, 402A2,1, 402A2,2 of the grid 400 may relate to direct indicator data about forest stands, indirect indicator data about the forest stands, and empirical measurement data about the forest stands. A single scalar value is preferably associated with each cell 402A1,1, 402A1,2, 402A2,1, 402A2,2.
In an embodiment, low resolution grid values A<x,y> (i.e. a value for the cell 402A1,1) may be calculated from high-resolution grid values a<i,j> (i.e. values of sub-cells 404). The calculation may be made using any appropriate method, for example, convolutional neural network, statistical aggregation or filters combined with statistical aggregation.
The grid may comprise a plurality of grid layers 400A, 400B, 400C, 400T.
As already discussed in relation to
In some embodiments, values of the forest stand target attribute T are available only for some cells of all cells of the grid layer 400T. In an embodiment, forest stand target attribute T values are available only for a subset of all cells of the grid layer 400T, and in some embodiments, only for a small or a very small subset of all cells of the grid layer. In an embodiment, values of the forest stand target attribute statistically represent the whole geographical area covered by the grid. In some embodiments, the order of the “small subset” may be in the range of 0.01% (1 out of 10.000) down to 0.0001% (1 out of 1000.000) depending on the geographic independence of the cells. Typically, the forest stand target attribute cells come in clusters (one cluster per measured forest stand), and from a statistical perspective a large cluster does not contain much more information than a small cluster due to the similarity of the cells within the cluster. Further, in some embodiments, although values of the plurality of input variables A, B, C may be available for all or almost all cells of the grid layers 400A, 400B, 400C, only values of the plurality of input variables may be used that relate to cells corresponding to cells that have values of the target forest stand attribute T.
The grid layers 400A, 400B, 400C, 400T may be pre-computed from geographic source data by normalization to the target coordinate system and re-sampling to match the grid coordinates and geo-references of the target grid. The grid layers 400A, 400B, 400C, 400T may be stored in a geospatial database with indexes allowing fast access and joins with corresponding values from other grid layers.
In some embodiments, latitude and longitude values of the centerpoint of a grid cell may be used as input variables for the regression algorithm, to account for a potential geographic bias in the forest attributes.
In some embodiments, airborne laser scans (ALS) may be used to produce a grid layer providing the average height of trees within grid cells.
In some embodiments, a soil type may be used as is an input grid layer where each grid cell is associated with its predominant soil type. The predominant soil type of a grid cell may be calculated as the soil type occupying the largest area of given grid cell among all soil types covering the same area in a geographic map of soil types.
In some embodiments, a forest cover may be a grid layer computed, for example, from thematic maps delineating forest areas from non-forest areas (water, open land, residential areas) and preferably including the predominant tree type (coniferous or deciduous).
In some embodiments, silvicultural input data may be transformed from a forest stand-level data to grid-level data by assuming an equal distribution of the parameters throughout the forest stand, so that all grid cells contained in the area of the forest stand are assigned identical silvicultural attributes. The following provides some examples of grid layers that may be calculated based on silvicultural data: a year when forest was planted, quantity of seedlings, per species, and sufficiency of thinning operations.
In some embodiments, one or more grid layers may be derived from a digital surface model. These grid layers may comprise one or more of the following data per grid cell:
In some embodiments, climate grid variables may be pre-computed from climate reanalysis data covering, for example, the last 15 years (such as the ECMWF ERA5 data set with a grid size of 31×31 km). The following grid layers may be computed with the grid resolution of the reanalysis data and later re-sampled to the grid used in the regression algorithm:
In some embodiments, new grid layers may be added later to further enrich the input data. The machine learning algorithm used by the forest stand target attribute predictions system is capable of accommodating such additional layers.
In some embodiments, if a grid layer is missing or is incomplete for a certain region in which predictions are made, a default value may be used for these gaps:
The grid layers 400A, 400B, 400C, 400D may then be converted in a plurality of feature vectors 406A, 406B, 406C, 406D. Each feature vector 406A, 406B, 406C, 406D is a vector of scalar input variables associated with a single grid cell. For example, the feature vector 406A comprises values A1,1, B1,1, C1,1 and T1,1 associated with grid cells 402A1,1, 402B1,1, 402C1,1, and 402T1,1.
In some embodiments, the indirect indicator data may comprise time series for each of N input variables for a cell. An optimal aggregation function for computing a single derived input variable from a subset of up to N input variables may be calculated from the time series of these input variables for each cell, so that the aggregation function maximizes a correlation between the single derived input variable and the forest stand target attribute. Further, the aggregation function may be applied to all cells of the grid for computing a derived input grid layer. The input variable here may relate, for example, to temperature data (low/high/average), precipitation data (for example, the amount of rain), average humidity, depth of snow cover, average solar irradiance, average wind speed etc. Each cell could, for example, comprise a time series of daily values for these input variables over a period of 15 years. An example for a derived input variable could be “warm/light/moist days per year”, and its aggregation function could be defined as the average number of days per year with a daily low temperature above X and daily solar irradiance above Y and total rain amount above Z during the preceding month. An optimal aggregation function for this derived input variable would be the aggregation function described above whereby X, Y, and Z are chosen so that the Pearson correlation coefficient across all cells between the cell's forest target attribute (where known) and the corresponding derived input variable is maximized.
Feature vectors may be separately generated for each forest stand target attribute based on the plurality of grid layers associated with the input variables.
Machine learning-based regression models for predicting forest attributes may be trained and validated with empirical measurement data which have been gathered from forest stands or in a location where wood is processed. The empirical measurement data may comprise one or more of the following:
The empirical measurement data may be used to include all trees of one or more categories (for example, pulp wood and/or saw logs) of a harvested forest stand, to ensure that the empirical measurement data is representative for the entire forest stand and that they match the scope of the regression algorithm, which is also the whole tree population of a forest stand. In those embodiments using harvester data, only harvester data from clear cutting operations is considered while data from thinning operations is not considered.
Since the regression algorithm operates on a grid-level for predicting forest stand target attributes from a vector of input variables (i.e. from the feature vectors 406A, 406B, 406C, 406D), the empirical measurement data at stand-level may be transformed to grid-level estimates of the empirical measurement data. The empirical data may relate, for example, to X-ray data or other quality measurement data. The transformation may be done by assuming an equal distribution of forest stand target attributes throughout the forest stand area, so that all grid cells contained in this area get identical forest stand target attributes, if no further geo-references are contained in the measurements.
In some embodiments, values of the forest stand target attribute may be determined based on the X-ray data obtained, for example, from saw mills. The forest stand target attribute may represent a log quality parameter derived from X-ray images of logs, for example, density of year rings, density of knots, distance between knots, and other irregularities in the wood structure.
In some embodiments, empirical measurement data associated with a specific geographical location is attributed to a respective cell covering the specific geographical location. More specifically, measurements from harvester operations may comprise location-information about individual trees. The measurements may be attributed to the respective grid cell covering the location of the tree. This ensures a higher correlation between the input variables of the grid cell to the measurements, and consequently a higher accuracy of the resulting predictions.
In some embodiments, the location-information of trees from harvester measurements may have a tolerance (i.e. a potential inaccuracy) larger than, for example, 20% of the size of a grid cell (i.e. >3-6 meters). In this situation, tree attributes, in particular its volume, may be distributed between the grid cell and its eight adjacent cells using, for example, a weighted box filter. The weights of the filter may be chosen to reflect the area of each cell covered by an imaginary grid cell co-centric with the cell containing the tree and a size of (N+T)×(N+T) meters where T is the tolerance of the tree location in meters.
The regression model for a forest stand target attribute may be trained using an error-minimizing, non-linear machine learning algorithm, such as an Artificial Neural Network, Decision Tree, Random Forest, or Gradient Boosted Trees, or any algorithm which can handle hundred and more, potentially collinear, input variables.
For predicting forest stand attributes which are a not numeric by nature but categorical, such as the predominant tree species of a stand, where typically a classification algorithm could be applied, the preferred approach is to express the attribute using a combination of related numerical attributes. For example, the predominant species can be expressed as the species with the largest volume in a given stand, or the species with the highest probability of being the predominant species.
In some embodiments, the regression algorithm for forest stand target attributes at stand-level may be validated using leave-one-out cross-validation on the measurements for entire stands. More specifically, for each round of validation a train/test split may be created on the empirical data at grid-level so that all grid-level data from exactly one stand are used as test set, and remaining data are used as training set for one incarnation of a grid-level regression model.
In some embodiments, to minimize the prediction error determined via the cross-validation, the regression algorithm may be tuned using, for example, automatic hyperparameter optimization. To find the best hyperparameters with the least computing resources, a Bayesian hyperparameter optimization approach may be used.
When the feature vectors 406A, 406B, 406C, 406D are used with the training algorithm for the forest stand target attribute 502, ultimately a forest stand target attribute trained model is obtained. In some embodiments, a forest stand target attribute model is generated separately for each forest stand target attribute.
The final grid-level algorithm may further be validated using empirical data from sample plots, where such data are available. Sample plot data may be country-wide field measurements of small forest plots (typically 100 to 250 m2). The purpose of this validation is to compare prediction accuracy with existing, country-wide estimations of certain forest attributes.
At 600, a set of input feature vectors is constructed following the principles described in
Further, it is possible to predict, for a large number of forest stands, values of multiple forest stand target attributes based on the trained models for the forest stand target attributes and then apply at least one search criterion to find at least one forest stand matching at least one search criterion or matching all search criteria simultaneously. This enables a user to find best matching forest stand for his needs.
The illustrated system or apparatus 800 can also include a memory or memories 804. The memory 804 can include a non-removable memory and/or a removable memory. The non-removable memory can include RAM, ROM, flash memory, a hard disk, or other well-known memory storage technologies. The removable memory can include flash memory or other well-known memory storage technologies. The memory can be used for storing data and/or code for running an operating system and/or one or more applications.
The system or apparatus 800 may be configured to implement the various features, examples and embodiments illustrated, for example, in
According to an example, the processor 802 may be configured by the program code which when executed performs the examples and embodiments of the operations and functionality described. Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), Graphics Processing Units (GPUs). The system or apparatus 800 may additionally include components and elements not disclosed in
At least some of the aspects and embodiments discussed above enable at least one of the following:
Further, at least some of the aspects and embodiments discussed above may also allow more accurate silviculture management and precision harvesting techniques, and more accurate valuation of forest assets. Further, at least some of the aspects and embodiments discussed above may also enable owners of forest to get more accurate valuations of their assets and to better utilize their assets to meet market demands. Further, at least some of the aspects and embodiments discussed above may also enable purchasers of forest inventory to obtain a more accurate prediction of characteristics of various forest stands both for valuation purposes and to determine how well the inventory suits the intended processing purpose.
Any combination of the illustrated components disclosed in
Further, any combination of the illustrated components disclosed in
Those skilled in the art should understand that each step or operation, or any combinations of the steps or operation mentioned above, can be implemented by various means, such as hardware, firmware, and/or software. As an example, one or more of the steps or operation described above can be embodied by computer or processor executable instructions, data structures, program modules, and other suitable data representations. Furthermore, the computer executable instructions which embody the steps or operation described above can be stored on a corresponding data carrier and executed by at least one processor like the processor 802 included in the apparatus 800. This data carrier can be implemented as any computer-readable storage medium configured to be readable by said at least one processor to execute the computer executable instructions. Such computer-readable storage media can include both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, the computer-readable media comprise media implemented in any method or technology suitable for storing information. In more detail, the practical examples of the computer-readable media include, but are not limited to information-delivery media, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD), holographic media or other optical disc storage, magnetic tape, magnetic cassettes, magnetic disk storage, and other magnetic storage devices.
Although the exemplary embodiments of the present invention are disclosed herein, it should be noted that any various changes and modifications could be made in the embodiments of the present invention, without departing from the scope of legal protection which is defined by the appended claims. In the appended claims, the mention of elements in a singular form does not exclude the presence of the plurality of such elements, if not explicitly stated otherwise.
Number | Date | Country | Kind |
---|---|---|---|
20185930 | Nov 2018 | FI | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FI2019/050767 | 10/28/2019 | WO | 00 |