The present disclosure relates to a system and a method for image-based remote sensing of crop plants.
Efficient, precise and timely measurement of crop plant traits is important in the assessment of a breeding population. Modern plant breeding strategies rely on efficient genotyping and phenotyping for improving yield, that is, by studying the phenotypic responses in diverse germplasm.
The traditional phenotyping method to measure crop plant traits such as above-ground biomass (AGB)—includes destructively harvesting samples of the crop to measure fresh weight (FW), and oven drying the harvested samples to measure dry weight (DW). Retrieving the AGB measures such as the DW and the FW has remained challenging under field conditions with a variety of genotypes having diverse growth response including during reproductive and senescence stages. Using the traditional method, to investigate the biomass of a diverse number of genotypes in field breeding research, a large number of plants needs to be harvested regularly over different growth stages, which is very time consuming and labour intensive. Moreover, as this method is destructive, it is impossible to take multiple measurements on the same plot at different time points. Although some image processing methods and systems have been proposed for remote sensing of crops, developing high-performance algorithms for certain tasks with existing tools remains time-consuming, resource expensive and relies heavily on human expertise and trial-and-error processes.
It is desired to address or ameliorate one or more disadvantages or limitations associated with the prior art, or to at least provide a useful alternative.
Disclosed herein is a method for image-based remote sensing of crop plants, the method including:
The SfM method can include using green bands of the multispectral images as a reference band.
The SfM method can include geometrically registering the orthomosaic reflectance map with the DSM and the DTM using one or more ground control points (GCPs) in the images adjacent to or in the crop.
The method can include determining the CC (optionally in the form of a CC layer) from a fusion of the OSAVI and the CHM. The fusion of the OSAVI (optionally in the form of an OSAVI layer) and the CHM (optionally in the form of a CHM layer) can include a pixel-wise product of the OSAVI layer and the CHM layer.
The CHM and the multispectral orthomosaic reflectance map are complementary and both represent the same crop area.
Disclosed herein is a system for image-based remote sensing of crop plants, the system including an aerial data acquisition system with:
Disclosed herein is machine-readable storage media including machine readable instructions that, when executed by a computing system, perform a data-processing method including:
Disclosed herein is a method for image-based remote sensing of crop plants, the method including:
The crop plants may include wheat. The phenotypic characteristics of the crop plants include whether or not there is wheat lodging (i.e., a classification task), and/or a level (i.e., measure) of the wheat lodging using a lodging estimator of levels in the images, i.e., a regression task.
The method may include generating an orthomosaic image of the acquired images with measured coordinates of one or more ground control points (GCPs) used for geo-rectification.
Disclosed herein is a system for image-based remote sensing of crop plants, the system including an aerial data acquisition system with:
The aerial data acquisition system can include a geotagging module to geotag each image.
The system can include one or more ground control points (GCPs) for geometric correction of the geotagging module.
Disclosed herein is machine-readable storage media including machine readable instructions that, when executed by a computing system, perform a data-processing method including:
Disclosed herein is a method including:
Disclosed herein is a system including computing system configured to perform a data-processing method including:
Disclosed herein is machine-readable storage media including machine readable instructions that, when executed by a computing system, perform a classification task, including:
Some embodiments of the present invention are hereinafter described, by way of example only, with reference to the accompanying drawings, in which:
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
m.
n.
o. del with 43,895 parameters after 50 and 100 trials;
p.
q.
Disclosed herein is a method for image-based remote sensing of crop plants. Also disclosed herein is a system configured to perform the method.
The method (also referred to herein as the ‘fusion method’) includes:
The three fusion steps may provide substantially improved accuracy. The DSM is a spectral representation of the crop, and the DTM is structural representation of the crop, so the DSM may be referred to as a “spectral layer”, and the DTM may be referred to as a “structural layer”. The method thus combines the spectral layer and the structural layer in one of the fusion steps.
The SfM method can include using green bands of the multispectral images as a reference band.
The method can include determining the CC (in the form of a CC layer) from a fusion of the OSAVI and the CHM. The fusion of the OSAVI (in the form of an OSAVI layer) and the CHM (in the form of a CHM layer) can include a pixel-wise product of the OSAVI layer and the CHM layer.
The co-called intermediate traits, including crop height model (CHM), crop coverage (CC) and crop volume (CV), may be capable of inferring important agronomic insights in high-throughput breeding research for screening genotypes or identifying genotypic markers governing the fundamental response of the genotypes. The method can include determining a measurement of crop yield based on: the crop coverage (CC) values; the crop volume (CV) values; and/or the biomass or above-ground biomass (AGB) values, which can be represented by a measure of a total dry weight (DW) or a total fresh weight (FW) of organic matter per unit area at a given time. Above-ground crop biomass is an important factor in the study of plant functional biology and growth, it is the basis of vigour and net primary productivity, and may be crucial for monitoring grain yield.
The CHM and the multispectral orthomosaic reflectance map are complementary and because they both represent the same crop area.
The DTM may be referred to as the digital elevation model (DEM).
The system (also referred to herein as the ‘fusion system’) includes an aerial data acquisition system with:
The UAV is a small UAV. The UAV may be an off-the-shelf small UAV in the form of a quadcopter, e.g., as shown in
The data acquisition system includes a switching mode power supply to step down the output voltage of the UAV to power the multispectral sensor.
The data acquisition system includes a gravity-assisted gimbal bracket (3D printed) that fastens and attaches the multispectral sensor to the gimbal mount.
The data acquisition system records position values of the images, i.e., latitude, longitude and altitude, on multispectral sensor tags using the GPS module.
The multispectral sensor may be an off-the-shelf multispectral camera.
The multispectral sensor logs dynamic changes in incident irradiance levels using the DLS.
The multispectral sensor may have five spectral bands: blue (475 nm), green (560 nm), red (668 nm), red edge (717 nm), and near-infrared (840 nm).
The multispectral sensor measures at-sensor radiance, the radiant flux received by the sensor. The at-sensor radiance is a function of surface radiance (i.e., flux of radiation from the surface) and atmospheric disturbance between the surface and the sensor, which may be assumed to be negligible for UAV-based surveys.
The data acquisition system is configured to trigger the multispectral sensor to acquire the images at a selected height above the crop, e.g., 30 m to provide a ground sampling distance (GSD) of 2 cm. The data acquisition system is configured to trigger the multispectral sensor to acquire adjacent images based on location from the GPS module with a selected overlap between adjacent images, e.g., 85% forward and side overlap.
The system includes at least one radiometric calibration panel (also referred to as a “standardised reflectance panel”) with known radiometric coefficients for individual multispectral bands. The method includes the multispectral sensor recording radiometric calibration measurements from the radiometric calibration panel before individual flight missions. The method includes image correction of the acquired images using the radiometric calibration measurements.
The camera records at-sensor radiance measurements for each band in dynamically scaled digital numbers (DNs) at a predetermined bit depth, thus forming the raw images. The method includes using the at least one radiometric calibration panel to establish a linear empirical relationship between the DNs and surface reflectance during the survey by measuring the surface reflectance of the radiometric calibration panel under consistent illumination conditions during the survey. In the method, the logs from the onboard DLS are used to account for changes in irradiation during the acquisition of the images (by application of the linear empirical relationship to the raw DNs).
The method includes acquiring the mutually overlapping images of a field site (also referred to as a ‘study area’) that includes the crop plants.
The system includes one or more ground control points (GCPs) for geometric correction of the GPS module, for example 5 GCPs. The system can include a high-precision positioning receiver to measure respective locations of the GCPs based on a multi-band global navigation satellite system (GNSS), e.g., a real-time kinetic (RTK) positioning receiver, with centimetre level precision, e.g., an accuracy of 0.02 m in planimetry and 0.03 m in altimetry. The positioning receiver may be an off-the-shelf receiver. The method may include installing the GCPs adjacent to or in the field site, and measuring the GCP locations within a day of acquiring the images, e.g., on the same day. The GCPs may be installed such that there is one GCP in a centre of the field site, one GCP at each of the four corners of the field site.
The system includes a computing system configured to receive the acquired images (i.e., the aerial multispectral images) from the aerial data acquisition system.
The computing system is configured to process the acquired images for modelling the DW and the FW by performing a data-processing method illustrated in
In the photogrammetry step, the photogrammetry application is used to correct the raw images (i.e., the surface radiance measurements) for the influence of incident radiation. In the photogrammetry step, the DNs are converted into absolute surface reflectance values (i.e., a component of the surface radiance independent of the incident radiation/ambient illumination) using: a linear empirical relationship between the DNs and the surface reflectance (wherein the linear relationship is determined by multispectral sensor images of the standardised reflectance panels under consistent illumination), and a time-dependent factor using the logs from onboard DLS sensor to account for changes in irradiation during the acquisition of the images. In the photogrammetry step, the computing system corrects optical (filters and lenses) aberrations and vignetting effects to maintain a consistent spectral response between images. The photogrammetry step may use an off-the-shelf photogrammetry application, e.g., Pix4D.
In the SfM step, composite images (of the orthomosaic, the DSM and the DTM) are generated by stitching hundreds of different calibrated images captured from mutually separate, individual flight missions, e.g., using the photogrammetry application. The SfM step combines the large number of images from a plurality of UAV missions, e.g., based on an example SfM steps described in Harwin et al (Harwin, S.; Lucieer, A. ‘Assessing the Accuracy of Georeferenced Point Clouds Produced via Multi-View Stereopsis from Unmanned Aerial Vehicle (UAV) Imagery’ in ‘Remote Sensing’ 2012, 4, 1573-1599, doi:10.3390/rs4061573). The SfM step includes a feature matching step using a scale-invariant feature transform (SIFT) to create optimized resection geometry for improving initial multispectral sensor position accuracy obtained through the recorded GPS tags. The SfM step includes optimizing the multispectral sensor parameters based on any matched triangulated target points between multiple images, wherein the number of matched points can be selected, e.g., at 10,000 points. The SfM step includes a bundle adjustment step to generate a sparse model that contains generalized keypoints that connect the images. The SfM step includes adding a plurality of fine-scale keypoints during reconstruction of a dense model, which may be crucial in improving geometric composition of the composite images. The SfM step includes using the known GCP locations (also referred to as “GCPs”) of the GCPs in the images collected during the aerial survey to geometrically register the orthomosaic and the models. The SfM step runs on a selected ‘reference’ band, which can be the ‘green’ bands of the multispectral images as the raw images include primarily vegetation features. The SfM step connects identical features in the overlapping portion of adjacent images using the computed keypoints. The SfM step produces the composite reflectance orthomosaic, exported in rasterized (.tif) format. The SfM step recomputes bundle block adjustment to optimize the orientation and positioning of the underlying densified point cloud. The SfM step applies noise filtering to generate the DSM and the DTM. The SfM step applies a sharp surface smoothing to retain crop surface boundaries when generating the DSM and the DTM. The SfM step produces the DSM and the DTM of the field site exported in rasterized (.tif) format. The SfM step resamples the exported layers, namely the orthomosaic, the DSM and the DTM, using an inverse distance weighting function to a 2-cm ground sampling distance (GSD) to provide consistent pixel dimensions.
The spectral and structural (DSM and DTM) layers obtained from the multispectral sensor are used to compute different intermediate layers which were fused at multiple processing levels in the three fusion steps.
The VI generation step uses the orthomosaic—i.e., the spectral reflectance images from the SfM step—to generate a plurality of values for vegetation indices (VIs). Each VI may be regarded as a spectral transformation of two or more multispectral bands to highlight a property of the crops. The VI values may be used for a reliable spatial and temporal inter-comparison of crop photosynthetic variations and canopy structural profiles. The VIs are immune from operator bias or assumptions regarding land cover class, soil type, or climatic conditions, therefore may be suitable in high-throughput phenotyping. Seasonal, interannual, and long-term changes in crop structure, phenology, and biophysical parameters could be efficiently monitored using the VIs measured from a plurality of surveys over time. The VI generation step may compute a plurality of Vis, e.g., 12, using one or more of the equations listed in Table 1. The VI generation step generates at least the optimized soil adjusted vegetation index (OSAVI) in an OSAVI layer using known relationships, e.g., described in Fern et al. (Fern, R. R.; Foxley, E. A.; Bruno, A.; Morrison, M. L. ‘Suitability of NDVI and OSAVI as estimators of green biomass and coverage in a semi-arid rangeland’ in ‘Ecological Indicators’ 2018, 94, 16-21, doi:10.1016/j.ecolind.2018.06.029) or Rondeaux et al. (Rondeaux, G.; Steven, M.; Baret, F. ‘Optimization of soil-adjusted vegetation indices’ in ‘Remote Sensing of Environment’ 1996, 55, 95-107, doi:Doi 10.1016/0034-4257(95)00186-7).
where the References are:
The CHM step performs a pixel-wise subtraction of the DTM altitudes from the DSM altitudes to generate the crop height model (CHM), representing the relief of the entire crop surface. The accuracy of the CHM computed using the SfM step may rely on interacting factors including the complexity of the visible surface, resolution and radiometric depth, sun-object-sensor geometry, and type of sensor. As the canopy surface, e.g., for wheat, may be very complex containing reflectance anisotropy and micro-relief height variation, a moving filter (e.g., a 3×3 pixel local maximum moving filter) may be applied on the CHM layer to enhance the highest peaks and reduce the micro-variation. The implemented filter may move the pre-defined window over the CHM and replace the centre pixel's value with the maximum value in the window if the centre pixel is not the maximum in the window.
The CC step uses OSAVI to suppress background soil spectrum to improve the detection of vegetation. The CC step fuses the OSAVI layer and the CHM layer to create a CC layer to mask the extent of the vegetation for individual plots across all time points. In the CC step, individual segmentation layers are prepared for OSAVI and CHM using a dynamically computed threshold using an adaptive thresholding method for binarization in image processing, e.g., the Otsu method (described in Otsu, N. ‘A threshold selection method from gray-level histograms’ in ‘IEEE transactions on systems, man, and cybernetics’ 1979, V9, 62-66), in which the threshold is computed adaptively by minimizing intra-class intensity variance (i.e., between index values for OSAVI and height levels for CHM), or equivalently, by maximizing inter-class variance. The adaptive thresholding method returns a single threshold that separate pixels into two classes: vegetation and background. The adaptive thresholding method may filter unwanted low-value OSAVI and CHM pixels, corresponding to minor unwanted plants such as weeds or undulated ground profile respectively. The CC step generates a pixel-wise product of the segmented OSAVI and CHM pixels to prepare the CC mask corresponding to vegetation, e.g., wheat. The CC step uses the fusion of the OSAVI layer and the CHM layer to resolve limitations that each layer has on its own. The OSAVI layer provides the ‘greenness’ of the crop (which ‘greenness’ drops during flowering and after maturity). The CHM provides the crop relief, which may be immune to changes in ‘greenness’ (so applicable during flowering and post-emergence of maturity), but suffers when the plants are too small (e.g., less than 5 cm approximately) as the crop canopy may be too fragile for the SfM step to generate a dependable CHM. These independent limitations are resolved in the CC step through the fusion of the OSAVI and CHM layers, improving classification of the CC of the crop.
The crop volume (CV) step computes the CV by multiplying the CHM and the CC pixelwise, then by summing the volume under the crop surface, e.g., using the formula in Equation 1:
where i and j represent the row and column number of the image pixels for an m×n image, i.e., the size of an individual plot. The multiplication of the CHM with the CC layer in the CV step may mitigate errors from ground surface undulations and edge-effects in surface reconstructed in the SfM step.
The dry weight (DW) step uses a linear model relationship, e.g., as in Equation (2), to compute the DW:
DW=α·CV+β (2)
where coefficients slope (α) and bias (β) are measured parametrically using measured (ground truth) DW values, which may be constant for a selected crop type (e.g., wheat) and constant sowing rate. The DW step provides DW using CV through non-invasive and non-destructive means applicable for field high-throughput phenotyping.
The fresh weight (FW) step computes the FW by fusing the CV with the set of derived VIs. CV is a canopy structural metric computed through the SfM step and estimates the dry tissue content or DW, but is void of the ability to infer the fresh tissue water content or FW; however, VIs are reflectance-derived biophysical parameters having the ability to infer photosynthetic variations and related water potential in vegetation. The fusion of the CV and the VIs resolves the limitations of the individual parameters. The fusion in the FW step can use a mathematical product in a relationship shown in Equation 3:
FW=α·CV×VIs+β (3)
where the slope (α) and the bias (β) are the same as in Equation 4, and where the model coefficient values (which are the slope (α) and the bias (β)) vary corresponding to the different VIs.
The hereinbefore-described steps of the data-processing module are performed by the computing system executing machine readable instructions (in non-transient storage media) that are defined by the hereinbefore-described steps. The instructions can be generated from source code written in Python 3.7.8 (Python Software Foundation. Python Language Reference) using source packages including os, fnmatch, matplotlib, numpy, Fiona, shapely, skimage, opencv2, rasterio, and geopandas.
A shapefile (.shp) consisting of the individual field plot information can be created in the method using an off-the-shelf application.
The coded method steps of the data-processing module include generation of the intermediate geospatial layer corresponding to individual traits, clipping the layers to plot geometries, summarizing the traits in individual plots, and analyzing and validating the summarized traits.
The method and system may be used for phenotyping germplasm in wheat and other crop species. The method and system may be used for estimating crop parameters such as for AGB mapping, and for measuring leaf area index, chlorophyll, nitrogen content, lodging, plant density estimates, counting wheat-ear numbers, etc. The method and system may be substantially accurate and may correlate with AGB in reproductive and post-reproductive growth stages (i.e., the example model relationship is valid across different growth stages including during and post-reproductive periods) in breeding trials with wide diversity in growth responses between genotypes (i.e., a variety of genotypes with wide diversity in growth).
The method and the system may improve upon alternative high-throughput technologies that use reflectance spectra because reproductive growth does not relate to reflectance spectra, and multiple growth stages exist concurrently in diverse genotypes. The method and the system may improve upon traditional technologies using vegetation indices (VIs) that may saturate at high canopy coverage (including in the red and near-infrared spectral bands), that may fail to capture the plants' vertical growth profile, and that may lose their sensitivity in the reproductive growth stages.
The system and method can provide or support high-throughput phenotyping that is more cost and time-efficient than traditional phenotyping, thus supporting crop breeding programmes. Relevant application areas may include nitrogen use efficiency, water use efficiency, heat tolerance, salt tolerance, insect damage resistance, yield forecasting, and other allied application areas. The fusion system and method may be time-efficient and cost-effective compared to other methods employing secondary sensor systems such as SAR and LiDAR in addition to a multispectral sensor or a dedicated hyperspectral sensor to compute narrow band VIs. The fusion system and method include the fusion of complementary spectral and structural information derived using the same multispectral sensor to provide a suitable and robust alternative to traditional methods involving purely spectral VIs. The fusion-based data-processing method uses intermediate parameters or metric or traits (i.e., the VIs, the CHM, the CC, and the CV), and the interaction between these intermediate parameters at different levels, to provide improved accuracy of parameters at successive steps/stages, thereby developing the model relationship for DW and FW. As described herein, the fusion steps are: (i) the logical ‘OR’ operation between the OSAVI segment and the CHM segment layer to derive the CC; (ii) the mathematical product between the CC and the CHM followed by summation over the plot area to calculate the CV relating to the DW, and (iii) the multiplication between the CV and the VIs to retrieve the FW (as shown in
The generated CHM and CC may be beneficial agronomic traits. Traditionally, for short crops such as wheat, plant height is measured using a ruler in the field by selecting a single or a few representative plants to represent the canopy height. The method is labour intensive, time-consuming and expensive for large breeding trials. Measuring variation in crop height associated with growth at finer temporal rates (less than once per week) remains largely impractical in widely distributed field trials. The CHM layer derived herein achieved satisfactory model correlation in estimating crop height in plots (as shown in
In an example, the implemented method outperformed commonly used vegetation indices (VIs) derived from multispectral orthomosaics in all growth conditions and variability, e.g., spectral VI based approaches to model biomass/AGB using the simple regression techniques.
In an example, the intermediate metrics, CHM (R2=0.81, SEE=4.19 cm) and CHM (OA=99.2%, K=0.98) correlated well with equivalent ground truth measurements, and the metrics CV and CV×VIs were used to develop an effective and accurate model relationship with FW (R2=0.89 and SEE=333.54 g/m2) and DW (R2=0.96 and SEE=69.2 g/m2).
An experimental example of the method and system was used at and for a site located in a mild temperate climate that receives approximately 448 mm average rainfall annually and has predominantly Self-mulching Vertosol soil type. The experiment comprised 20 wheat genotypes with four replications, each planted in 5m×1 m plot, with a density of approximately 150 plants m2. Five aerial flights were undertaken at 30, 50, 90, 130 and 160 days after sowing (DAS). For comparison with the experimental system, a range of in situ data were collected concurrently with use of the method, including: visual assessment of plant condition in plots, measurements of plant height at two time points (130 and 160 DAS), and harvesting plot replicates at four time points (50, 90, 130 and 160) for destructively measuring FW and DW biomass.
The experimental system included the data acquisition system.
In an experimental example, the plant heights of four replicates of 20 lines in 80 plots (i.e., 20 wheat genotypes, each with four replications, planted in 5m×1 m plot, with a density of approximately 150 plants per m2) were manually measured at the two time points on X DAS and Y DAS, totaling 160 ground truth height observations. Four representative wheat plants from each experimental plot were measured using a measuring staff from the ground level to the highest point of the overall plant, the average of the four height measurements was used as a representative ground truth measurement for the corresponding plot. Plant heights in the breeding population were ranged from 54 to 91 cm on 130 DAS and 62 to 98 cm on 160 DAS with a normal distribution (Kolmogorov-Smirnov test, P<0.001). The mean plant heights were 71.8 and 78.9 cm on 130 DAS and 160 DAS, respectively. To evaluate SfM derived CHM's performance with respect ground truth plot height measurements a correlation-based assessment was sought (
In an experimental example, to potentially validate the fusion-based steps described hereinbefore,
In an experimental example, a more rigorous approach to evaluate the achieved classification accuracy for the CC layer was performed through a comparison of CC classified labels against ground truth across randomly selected locations using a confusion or validation matrix (shown in Table 2). Over the five time points, a total of 1500 ground truth points (i.e., 300 points in each time point) were generated between the two classes: wheat CC and ground using an equalized stratified random method, creating points that are randomly distributed within each class, where each class has the same number of points. The ground truth corresponding to each point for validation was captured manually through expert geospatial image interpretation training using high-resolution (2 cm) RGB composite orthomosaic images. Accuracy measures namely producer's accuracy, user's accuracy, overall accuracy (OA) and kappa coefficient (κ) were computed using the confusion matrix. The classification class CC, achieved a user's accuracy of 98.9%, producer's accuracy of 99.4% and overall accuracy (OA) of 99.2%. In traditional accuracy classification, the producer's accuracy or ‘error of omission’ refers to the conditional probability that a reference ground sample point is correctly mapped, whereas the user's accuracy or ‘error of commission’ refers to the conditional probability that a pixel labelled as a class in the map actually belongs to that class. Overall accuracy refers to the percentage of classified map that is correctly allocated, used as a combined accuracy parameter from a user's and producer's perspective. In addition to the traditional estimates, the classification achieved a kappa coefficient (κ) of 0.98, which is an indicator of agreement between classified output and ground truth values. A kappa value ranges between 0 and 1 representing the gradient of agreement between ‘no agreement’ and ‘perfect agreement’.
In an experimental example, the hereinbefore-described modelling of the DW using the CV was compared against traditional VI based approaches. The comparison shows the R2 values obtained for predicted vs. observed DW values when using CV and VIs, across different time points. The linear regression demonstrates that the degree of correlation (in terms of R2) for the VI based approached in modelling DW become less accurate at progressive time points from sowing, while the accuracy of CV based approach for modelling DW remains consistently high (as shown in
In an experimental example, the modelled FW using CV×VIs products was evaluated using different CV and VIs combinations, and against independent VIs. For evaluating different CV×VIs combinations, a plot of the corresponding R2 values obtained for predicted vs. observed FW values across different time points was used (as shown in
In an experimental example, the DW and the FW generated from the method were used in scoring the performance of selected genotypes. Predicted crop biomass (DW and FW) estimated across the four dates showed expected and consistent growth trends for wheat genotypes (as shown in
Disclosed herein is a method for image-based remote sensing of crop plants. Also disclosed herein is a system configured to perform the method.
The method (also referred to herein as the ‘neural network method’ or NN method) includes:
The crop plants may include wheat. The phenotypic characteristics of the crop plants include whether or not there is wheat lodging (i.e., a classification task), and/or a level (i.e., measure) of the wheat lodging using a lodging estimator of levels in the images, i.e., a regression task.
The system (also referred to herein as the ‘neural network system’ or NN system) includes the NAS-generated ANNs for the image-based remote sensing of crop plants.
The neural architecture search (NAS) module selects a well-performing neural architecture through selection and combination of various basic predefined modules from a predefined search space. The predefined modules are pre-existing modules configured to be combined in variable configurations to form a variety of convolutional neural networks (CNN). The NAS module is configured to: select a plurality of mutually different combinations of the predefined modules to define a corresponding plurality of mutually different CNNs; test the plurality of mutually different CNNs; and select at least one of the plurality of mutually different CNNs based on performance of the CNNs in the test.
Due to the NAS, the step of forming the ANN is automated, and thus this step may be referred to as including “automated machine learning” or “AutoML”.
The NN method and system may be used to provide high-throughput image-based plant phenotyping.
The NN system includes an aerial data acquisition system for the acquiring of the images, the aerial data acquisition system including:
The UAV is configured to support a payload of at least 1 kg, or at least 6 kg, or between 1 kg and 6 kg. The UAV may be an off-the-shelf UAV in the form of a quadcopter. The data acquisition system includes a gimbal bracket that fastens and attaches the optical camera to the gimbal mount. The optical camera may have a 35.9 mm×24.0 mm sensor size, and/or a 42.4 megapixels resolution. The optical camera may have a 55 mm fixed focal length lens. The optical camera may have a 1 second interval shooting with JPEG format in shutter priority mode. The camera may be an off-the-shelf multispectral camera. The geotagging module may be an off-the-shelf geotagging module.
The NN system includes one or more (e.g., seven) black and white checkered square panels (38 cm×38 cm) distributed in the field to serve as ground control points (GCPs) for accurate geo-positioning of images. The method may include installing the GCPs adjacent to or in the field site, and measuring the GCP locations within a day of acquiring the images, e.g., on the same day. The GCPs may be installed such that there is one GCP in a centre of the field site, and at least one GCP on each edge of the field site (as shown in
The NN system includes a computing system configured to receive the acquired images (i.e., the aerial multispectral images) from the aerial data acquisition system. The computing system is configured to process the acquired images by performing pre-processing steps. The pre-processing steps include: generating an orthomosaic image of the acquired images with the coordinates of the GCPs used for geo-rectification (e.g., an orthomosaic with a ground sampling distance (GSD) of 0.32 cm/pixel) using an off-the-shelf software application; and clipping individual plot images and storing them in TIFF format from the orthomosaic using a field plot map with polygons corresponding to the selected plot dimensions (e.g., 5 m×1 m for the plot in
As shown in
As shown in
The step of forming the ANN is executed in the computing system of the NN system. The computing system includes machine readable instructions in machine-readable storage media (e.g., in RAM, a hard disk or cloud server) that, when executed by the computing system, perform the data-processing method. The instructions can be generated from source code written in Python 3.7. The computing system may include or access a publicly accessible machine-learning framework, e.g., AutoKeras. The computing system may include a high-speed graphical processing unit (GPU), e.g., with 24 GB of memory.
The NN system and method may be fast enough for real-time inferencing, e.g., with example interference speeds under 10 ms.
In an experimental example for wheat lodging assessment with UAV imagery, the NN method outputs from the image classification and regression tasks were compared to outputs from a manual approach using transfer learning with publicly available CNNs (including VGG networks, residual networks (ResNets), InceptionV3, Xception and densely connected CNNs (DenseNets)), pretrained on the publicly available ImageNet dataset. For image classification, plot images were classified as either non-lodged or lodged; for image regression, lodged plot images were used as inputs to predict lodging scores. The best tested classification performance of 93.2% was jointly achieved by transfer learning with Xception and DenseNet-201 networks. In contrast, the best in test example NN method and system (based on AutoKeras, from 100 trials) achieved an accuracy of 92.4%, which was substantially the same as those obtained by transfer learning with ResNet-50. In another text, the example NN method and system had the best in test accuracy (92.0%) compared to the ResNet-50 (90.4%) in image classification which assigned wheat plot images as either lodged or non-lodged. For image regression, lodged plot images were used as inputs to predict lodging scores. The example NN method and system performed better (R2=0.8273, RMSE=10.65, MAE=8.24, MAPE=13.87%) in this task compared to the ResNet-50 (R2=0.8079, RMSE=10.81, MAE=7.84, MAPE=14.64%). In another test, the best in test performance (R2=0.8303, RMSE=9.55, MAE=7.03, MAPE=12.54%) was obtained using transfer learning with DenseNet-201, followed closely by the example NN method and system (AutoKeras, R2=0.8273, RMSE=10.65, MAE=8.24, MAPE=13.87%) with the model discovered from 100 trials. In both image classification and regression tasks, transfer learning with DenseNet-201 achieved the best in test results. DenseNet can be considered as an evolved version of the ResNet, where the outputs of the previous layers are merged via concatenation with succeeding layers to form blocks of densely connected layers; however, similarly to image classification, the DenseNet-201 had the slowest inference time (117.23±15.25 ms) on the test dataset in image regression, making it potentially less suitable for time-critical applications such as real-time inferencing. In comparison, the example NN method and system (AutoKeras) resembled a mini 8-layers Xception model (207,560 parameters) and had the fastest inference time (2.87±0.12 ms) on the test dataset, which was ˜41-fold faster compared to the DenseNet-201. In its original form, the Xception network is 71-layers deep (˜23 million parameters) and consists of three parts: the entry flow, middle flow and exit flow, with ˜10-fold faster inference speed (8.46 ms), making it suitable for real-time inferencing. These three parts and two key features of the Xception network, namely the depthwise separable convolutions and skip connections originally proposed in ResNet were discernable from the mini Xception model.
In an experimental example during the winter-spring cropping season of 2018, wheat seeds were sown to a planting density of 150 plants/m2 in individual plots measuring 5 m long and 1 m wide (5 m2), with a total of 1,248 plots (Lat:36°44°35.21″S Lon:142°6′18.01″E), as shown in
In an experimental example of the image classification, the image dataset consisted of 1,248 plot images with 528 plots identified as non-lodged (class 0) and 720 plots identified as lodged (class 1). Images were first resized (downsampled) to the dimensions of 128 width×128 height×3 channels and these were split 80:20 (seed number=123) into training (998 images) and test (250 images) datasets. For image regression, the 720 resized plot images identified as lodged were split 80:20 (seed number=123) into training (576 images) and test (144 images) datasets. Images were fed directly into the example NN method and system (AutoKeras) without pre-processing. In contrast, images were pre-processed to the ResNet-50 format using the provided preprocess_input function in Keras. For model training on both image classification and regression, the training dataset was split further 80:20 (seed number=456) into training and validation datasets. The validation dataset is used to evaluate training efficacy, with lower validation loss indicating a better trained model. Performance of trained models was evaluated on the test dataset. A custom image classifier was defined using the AutoModel class which allows the user to define a custom model by connecting modules/blocks in AutoKeras (as shown in
For comparison with the experimental example, the pretrained CNNs (e.g., ResNet-50) were implemented in Keras as a base model using the provided Keras API with the following parameters: weights=“imagenet”, include_top=False and input_shape=(128, 128, 3) (as shown in
Model evaluation metrics were selected to compare the pretrained CNNs (e.g., ResNet-50) with the example NN system and method. For image classification, model performance on the test dataset was evaluated using classification accuracy and Cohen's kappa coefficient (Cohen, J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 1960, 20, 37-46). In addition, classification results using ResNet-50 (transfer learning) and the best in test example NN model (from AutoKeras) were visualized using confusion matrices. For image regression, in addition to the mean absolute error (MAE) and the mean absolute percentage error (MAPE) provided by AutoKeras and Keras, the coefficient of determination (R2) and the root mean-squared error (RMSE) were also calculated to determine model performance on the test dataset. Results from ResNet-50 and the best in test example NN model were visualized using regression plots which plotted predicted lodging scores (y_predict) against actual scores (y_test). Models were also evaluated based on total model training time (in minutes, min) and inference time on the test dataset presented as mean±standard deviation per image in milliseconds (ms).
Accuracy: accuracy represents the proportion of correctly predicted data points over all data points. It is the most common way to evaluate a classification model and works well when the dataset is balanced.
where tp=true positives, fp=false positives, tn=true negatives and fn=false negatives.
Cohen's kappa coefficient: Cohen's kappa (κ) expresses the level of agreement between two annotators, which in this case is the classifier and the human operator on a classification problem. The kappa score ranges between −1 to 1, with scores above 0.8 generally considered good agreement.
where po is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and pe is the expected agreement when both annotators assign labels randomly.
Root mean-squared error (RMSE): root mean-squared error provides an idea of how much error a model typically makes in its prediction, with a higher weight for large errors. As such, RMSE is sensitive to outliers and other performance metrics may be more suitable when there are many outlier districts.
where ŷi . . . ŷn are predicted values, yi . . . yn are observed values, and n is the number of observations.
Mean absolute error (MAE): mean absolute error, also called the average absolute deviation is another common metric used to measure prediction errors in a model by taking the sum of absolute value of error. Compared to RMSE, MAE gives equal weight to all errors and as such may be less sensitive to the effects of outliers.
where ŷi . . . ŷn are predicted values, yi . . . yn are observed values, and n is the number of observations.
Mean absolute percentage error (MAPE): mean absolute percentage error is the percentage equivalent of MAE, with the errors scaled against the observed values. MAPE may be less sensitive to the effects of outliers compared to RMSE but is biased towards predictions that are systematically less than the actual values due to the effects of scaling.
where ŷi . . . ŷn are predicted values, yi . . . yn are observed values, and n is the number of observations.
Coefficient of determination (R2): coefficient of determination is a value between 0 and 1 that measures how well a regression line fits the data. It can be interpreted as the proportion of variance in the independent variable that can be explained by the model.
where ŷi . . . ŷn are predicted values, yi . . . yn are observed values,
Both transfer learning with pretrained CNNs (e.g., ResNet-50) and the example NN model performed strongly in the image classification task, as shown in Table 5. In one test, best in test example NN model (from 100 trials) achieved an accuracy of 92.0% (as shown in
For the image regression task test, transfer learning with DenseNet-201 gave the best tested overall performance (R2=0.8303, RMSE=9.55, MAE=7.03, MAPE=12.54%), followed closely by the example NN model (from 100 trials) (R2=0.8273, RMSE=10.65, MAE=8.24, MAPE=13.87%) compared to transfer learning, e.g., using ResNet-50 (R2=0.8079, RMSE=10.81, MAE=7.84, MAPE=14.64%) (as shown in Table 3). The CNN models varied in regression performance, with R2 ranging between 0.76-0.83. Within the pretrained CNNs, DenseNet-201 had the slowest model training (7.01 min) and per image inference (0.8141±0.1059 ms) times, with ResNet-50 having the fastest training (3.55 min) time, with a per image inference time of 0.5502±0.0716 ms. For the tested example NN method and system, performance generally improved from 10 to 100 trials (Table 7). The example NN method and system (using AutoKeras) was able to achieve this performance using an 8-layers CNN resembling a truncated mini Xception network with 207,560 total parameters (as shown in
7.84
0.8273
10.65
13.87%
The best tested NN model had better performance scores across the board compared to the ResNet-50 model, with the exception that the ResNet-50 had a lower MAE of 7.84 compared to a score of 8.24 by the NN model (as shown in Table 3). A closer inspection of the regression plots for both models showed that the NN model had a much higher number of predictions exceeding the maximum value of 100 (n=39), with the largest predicted value being 118 whereas the ResNet-50 model only have one prediction exceeding 100 with a value of 100.29 (as shown in
The exemplary use of lodging severity introduced a fractional binning (1/3, 2/3, 3/3) of lodging scores, leading to a slightly staggered distribution as evidenced in the regression plots (as shown in
Table 4, showing Adam optimizer learning rates used in transfer learning, wherein the Adam optimizer was applied with the indicated learning rate and decay=learning rate/10:
Table 5, showing model performance metrics for image classification:
Table 6, showing confusion matrices for the test set in Table 5 for the best tested models from transfer learning and the NN model:
Table 7, showing model performance metrics of modals for image regression:
The reference herein to machine-readable storage media including machine readable instructions that, when executed by a computing system, perform a method, includes a computer memory encoding computer-executable instructions that, when executed by at least one processor, cause the at least one processor to perform a set of operations comprising said method.
Many modifications will be apparent to those skilled in the art without departing from the scope of the present invention.
The presence of “/” in a FIG. or text herein is understood to mean “and/or” unless otherwise indicated. The recitation of a particular numerical value or value range herein is understood to include or be a recitation of an approximate numerical value or value range, for instance, within +/−20%, +/−15%, +/−10%, +/−5%, +/−2.5%, +/−2%, +/−1%, +/−0.5%, or +/−0%. The terms “substantially” and “essentially all” can indicate a percentage greater than or equal to 90%, for instance, 92.5%, 95%, 97.5%, 99%, or 100%.
The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that the prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
Number | Date | Country | Kind |
---|---|---|---|
2020902903 | Aug 2020 | AU | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/AU2021/050859 | 8/6/2021 | WO |