Skip to main content

Integrating spaceborne LiDAR and Sentinel-2 images to estimate forest aboveground biomass in Northern China

Abstract

Background

Fast and accurate forest aboveground biomass (AGB) estimation and mapping is the basic work of forest management and ecosystem dynamic investigation, which is of great significance to evaluate forest quality, resource assessment, and carbon cycle and management. The Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2), as one of the latest launched spaceborne light detection and ranging (LiDAR) sensors, can penetrate the forest canopy and has the potential to obtain accurate forest vertical structure parameters on a large scale. However, the along-track segments of canopy height provided by ICESat-2 cannot be used to obtain comprehensive AGB spatial distribution. To make up for the deficiency of spaceborne LiDAR, the Sentinel-2 images provided by google earth engine (GEE) were used as the medium to integrate with ICESat-2 for continuous AGB mapping in our study. Ensemble learning can summarize the advantages of estimation models and achieve better estimation results. A stacking algorithm consisting of four non-parametric base models which are the backpropagation (BP) neural network, k-nearest neighbor (kNN), support vector machine (SVM), and random forest (RF) was proposed for AGB modeling and estimating in Saihanba forest farm, northern China.

Results

The results show that stacking achieved the best AGB estimation accuracy among the models, with an R2 of 0.71 and a root mean square error (RMSE) of 45.67 Mg/ha. The stacking resulted in the lowest estimation error with the decreases of RMSE by 22.6%, 27.7%, 23.4%, and 19.0% compared with those from the BP, kNN, SVM, and RF, respectively.

Conclusion

Compared with using Sentinel-2 alone, the estimation errors of all models have been significantly reduced after adding the LiDAR variables of ICESat-2 in AGB estimation. The research demonstrated that ICESat-2 has the potential to improve the accuracy of AGB estimation and provides a reference for dynamic forest resources management and monitoring.

Background

As the largest carbon storage in the biosphere, the forest ecosystem is an important part of the terrestrial ecosystem and plays an indispensable role in the global carbon cycle [1, 2]. A timely understanding of the current situation and dynamic change of forest ecosystem is of great significance for human beings to cope with global climate change, study the global carbon cycle, environmental monitoring, and realize human sustainable development. Accurate evaluation of carbon cycle capacity and carbon storage of forest ecosystem is an important link in quantitative analysis of carbon sink [3, 4]. Forest aboveground biomass (AGB) is one of the main components of forest carbon storage because of its large volume, long-term and large-scale impact on carbon balance [5]. As an important index to evaluate forest quality and forest ecosystem service function, AGB can directly measure carbon sequestration capacity [6]. Rapid and accurate acquisition of large-scale AGB is the basic work of forest resource management and ecosystem dynamic monitoring, which is of great significance for studying ecosystem interaction and formulating relevant policies in the process of achieving carbon neutralization [7, 8].

Remote sensing technology has the potential to quickly obtain the growth status of large-scale vegetation, which provides an effective reference for the monitoring and management of forest resources [9]. Extracting vegetation information from remote sensing images and combining it with a small amount of ground-measured data for modeling has become an effective and popular way of obtaining regional AGB [10]. Spectral reflectance can reflect the differences between ground objects, which is the theoretical basis of remote sensing inversion of forest parameters. Optical images are remote sensing data with the widest coverage, the most types and the richest time series in the world [11]. The rich spectral information of optical images can effectively reflect the distribution and growth of vegetation and has been widely used in vegetation classification and forest resources monitoring [12, 13]. Moderate-resolution Imaging Spectroradiometer (MODIS), as a representative of low spatial resolution optical images, has periodic land surface coverage on a large scale, which enables national and even global monitoring of land and vegetation changes. However, the coarse spatial resolution leads to an excessive amount of mixed features in the pixels, which makes the accuracy of identifying ground objects limited [14]. Remote sensing images with high spatial resolution have the potential to recognize surface objects more accurately. However, cloud cover, coverage, and high price lead to limitations in applications over large areas [15]. Due to the moderate spatial resolution, complete coverage and short revisit period, the medium spatial resolution data represented by Sentinel-2 and Landsat is still the most popular and widely used optical remote sensing image. Compared with Landsat data, Sentinel-2 carries more than three red edge bands which are more sensitive to vegetation growth, so it can provide more accurate land change and vegetation growth information [16]. In addition, the time-series data provided by Sentinel-2 makes it possible to obtain high-quality seasonal forest change, which can be effectively used for forest resource monitoring and dynamic management [17, 18]. Google earth engine (GEE) is a cloud-based geospatial processing platform, which can be used for large-scale terrestrial ecosystem monitoring. GEE archives a large number of remote sensing data for public use, and users can directly apply their algorithms to these data [19]. Due to its high efficiency, GEE has been widely used in land cover and land use change (LCLUC) assessment, disaster management, and forest monitoring [20]. GEE has integrated a variety of data including MODIS, Sentinel, Landsat, etc., which can be effectively applied to forest resource monitoring. Utilization of GEE to acquire and process Sentinel-2 data provides the potential to rapidly achieve high-precision forest AGB estimation and mapping on a large scale [21, 22].

Compared with optical remote sensing images, active remote sensing data sources such as synthetic aperture radar (SAR) and light detection and ranging (LiDAR) can penetrate the vegetation canopy to reach the ground surface and obtain information on the vertical structure of the forest stand, thus enabling more accurate estimation of forest parameters [2324 25]. However, SAR must work in bands with specific frequency, and Gleason et al. [10] found that these bands are usually not necessarily suitable for AGB estimation of all forest types. And because most of the forests are located in the complex terrain area, how eliminating the influence of the elevation fluctuation terrain on the signal is the one of the critical factors limiting the SAR data [23]. In addition, saturation in high AGB areas limits more applications of SAR [24].

LiDAR is another commonly used active remote sensing method, which obtains object features by transmitting and receiving detection signals to the target [25]. Among all LiDAR systems, spaceborne LiDAR, with its high orbit and wide observation area, is the only payload that can rapidly obtain three-dimensional spatial information on large scales or even the global surface [26]. ICESat-2 (Ice, Cloud, and Land Elevation Satellite-2) is one of the latest spaceborne LiDAR systems with a high repetition rate and high sensitivity, which is the first-time applying photon-counting LiDAR (PCL) technology to a satellite platform [27]. ICESat-2 is equipped with an advanced terrain laser altimeter system (ATLAS) using a sensitive single-photon detector. It has a high pulse repetition rate and can obtain a small spot and high-density photon point cloud data, to achieve more accurate three-dimensional surface information traction. ICESat-2 can provide forest vertical structure parameters and effectively alleviate the saturation of optical remote sensing images [28, 29]. Using ICESat-2 to quickly obtain global forest dynamic change information provides the potential to reveal vegetation canopy height, AGB distribution, and change pattern in a large area. However, the space-borne LiDAR data are not spatially continuous, it is necessary to combine optical or other continuous remote sensing data to obtain the continuous spatial distribution of AGB [30, 31].

Parametric and nonparametric methods are commonly used for forest parameter estimation [18]. Parametric models are simple in form and easy to implement, but poor in fitting in complex forest parameter estimation [17]. Nonparametric methods such as backpropagation (BP) neural networks, k-nearest neighbors (kNN), support vector machines (SVM), and random forests (RF) do not require assumptions about sample distribution and can suppress overfitting, and have been shown to be effective for forest AGB estimation [31, 32]. However, due to the complexity of the forest environment, the applicability of these models is inconsistent for different forest types. In addition, the estimation accuracy of these models being used individually is always limited due to factors such as sample distribution, modeling variables, and hyperparameters [32]. Ensemble learning, represented by the stacking algorithm, integrates the advantages of multiple base models and can be effective in achieving higher accuracy forest parameters estimation [31]. However, the effectiveness of stacking constructed with nonparametric models as base models for AGB estimation still needs to be validated. The study aimed to propose a stacking algorithm for AGB estimation and continuous AGB mapping in Northern China. Non-parametric methods including the BP, kNN, SVM, and RF models were used as the base model to construct stacking and to perform the comparison of AGB estimation. The Sentinel-2 images provided by google earth engine (GEE) were used as the medium to synergize with ICESat-2 and measured AGB collected in Saihanba forest farm was used for results validation of the models. In addition, the influence of ICESat-2 variables on the accuracy of AGB estimation was tested and discussed.

Methods

Study area

This study was conducted in Saihanba Mechanical Forest Plantation, which is the largest forest farm of plantation in China. Saihanba is located in Hebei province, northern China (116°51′–117°39′ E, 42°02′–41°36′ N) (Fig. 1). The altitude of Saihanba ranges from 1010 to 1940 m, with a temperate continental monsoon climate. The annual average temperature, frost-free period, and average annual precipitation in Saihanba are − 1.3 °C, 68 days, and 490 mm, respectively. Larch (Larix ologensis), Scots pine (Pinus sylvestris), Birch (Betula platyphylla Suk), and Spruce (Picea asperata Mast) are the dominant tree species. Saihanba forest farm has a forest coverage rate of more than 80% and a total forest stock volume of 10.12 million m3, which is one of the main sources of timber provided in China.

Fig. 1
figure 1

Location and boundary of the study area

Remote sensing data acquisition and processing

ICESat-2 was launched on September 15, 2018, which is equipped with the ATLAS that emits three pairs of beams. The distance between each pair of beams is about 3.3 km, and the distance between the two ground tracks in one beam pair is 90 m [27]. The ATL08 products (Version 3) of ICESat-2 covering the study area from 2018 to 2019 were downloaded from the National Ice and Snow Data Center (NSIDC) (https://nsidc.org/data/ATL08/) (Fig. 2a). The Differential, Regression, and Gaussian Adaptive Nearest Neighbor (DRAGANN) methods have been developed to identify and eliminate noisy photons for ATL08 production [33]. The ATL08 directly provides height estimate segments s with an interval of 100 m along the track. Each segment has a radius of 8.5 m and contains information on center coordinates, altitude, height metrics, and apparent surface reflectance (ASR). Height metrics in ATL08 include minimum, mean, median, maximum, and multiple height profile quantile of the vegetation canopy height [30, 31]. All segments were masked using the boundary of the study area, and the segments marked as valid in ALT08 were selected. In our study, segments with canopy height greater than 50 m and less than 5 m were excluded to ensure the stability and validity of the data. Most trees below 5 m are in a rapid growth age cycle. Finally, a total of 5396 segments were selected as sample plots for AGB estimation in Saihanba (Fig. 2b).

Fig. 2
figure 2

a The altitude distribution with the ICESat-2 data and b the selected ATL08 segments covering within forest land in Saihanba

Google earth engine (GEE) platform was used to obtain Sentinel-2 images and preprocess. And images during the growing season in 2017 with cloud cover less than 5% were obtained. To ensure the stability and reliability of the images, median synthesis was performed for all pixels [34]. Sentinel-2 carries more than three red edge bands which are extremely sensitive to the change of vegetation chlorophyll and have been proved to be effective for forest AGB estimation [17, 18]. Band reflectance and vegetation index are the basic variables for AGB estimation, which can effectively represent the growth status and health level of vegetation [10]. Bands with spatial resolution better than 20 m were selected. Vegetation indices (VIs) are obtained by band combinations and calculations, and can be used to quantitatively describe the growth status of vegetation. In addition, the red-edge vegetation indices from the combination of red-edge bands, which are extremely sensitive to vegetation chlorophyll, can accurately reveal the vegetation health. Vegetation indices are closely related to forest AGB and have been shown to be used as variables for forest parameter modeling and estimation [29,30,31]. Eight vegetation indices including Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Red-green vegetation Index (RGVI), Atmospherically Resistant Vegetation Index (ARVI), Red Edge Normalized Difference Vegetation Index (RENDVI), Red Edge Chlorophyll Index (RECI) and Red Edge Simple Ratio (RESR) [30, 31, 35, 36] were also extracted in the study to establish the coupling relationship with ICESat-2 for continuous AGB mapping (Table 1).

Table 1 The spectral variables extracted from Sentinel-2 used in this study

Statistics of measured AGB values

The measured AGB used in the study was obtained from the forest management inventory (FMI) database in Saihanba. The FMI is conducted every ten years in China with annual supplementary surveys to update the database, which is one of the main ways to capture dynamic changes in forest resources. FMI can provide scientific data for the quality evaluation of forest resources and the formulation of management policies [37]. Irregular subcompartments containing measured forest data are established in the forest resource database for forest resource management and the database is updated every year by auxiliary investigation. The subcompartments mainly include the attributes of tree species, average canopy height, and tree diameter at breast height (DBH). Tree height was measured by laser altimeter and the diameter tape was used to measure the DBH. The average of three measurements was used as the final result. The updated database of Saihanba forest farm in 2017 was obtained, and the forest data were extracted for AGB calculation.

In our study, blocks of FMI covered by ICESat-2 segments were selected for forest attribute extraction. Finally, 5396 sample plots composed of seven tree species were determined in Saihanba (Fig. 2). Larch and Birch are the most widely distributed tree species, separately accounting for 56.1% and 22.4% of the total sample plots. Allometric equations based on different tree species summarized by Li et al. [38] were used to calculate the AGB values of the plots (Table 2). The AGB values of all the samples ranged from 5.14 to 522.68 Mg/ha, and the mean value, standard deviation and coefficient of variation were 135.30 Mg/ha, 82.13 Mg/ha and 60.7% respectively. Among the seven tree species, the mean AGB value of the Chinese Pine was the maximum while the Oak was the minimum (Fig. 3).

Table 2 Allometric growth equation based on different tree species for AGB calculation
Fig. 3
figure 3

The mean and standard deviation of AGB values under seven tree species in Saihanba

Variable selection

An appropriate variable combination can significantly improve the accuracy of the AGB estimation model. Nonlinear variable selection methods can better describe the relationship between remote sensing variables and AGB under complex conditions compared to linear methods [10]. The importance evaluation based on the random forest algorithm can provide the contribution of all variables to the model, to calculate the relative importance of variables [39]. The main idea is that if the out-of-bag accuracy decreases significantly after adding noise to a feature randomly, it means that this feature has a great impact on the prediction result of the sample, i.e., it indicates that its importance is relatively high [17, 31]. Importance ranking can be effective in screening variables and has been shown to be effective for AGB estimation [40].

To determine the validity of LiDAR variables extracted from ICESat-2 for AGB estimation, the importance ranking, and error change trends were conducted to determine the variable combination of: (1) spectral variables, and (2) spectral variables and LiDAR variables. Finally, the variable combination that obtained the minimum RMSE was used for further modeling. The “randomForest” package [41] in R 4.0.2 was used in the study to calculate the importance of all the feature variables.

AGB estimation models

The Backpropagation (BP) neural network is a multilayer feedforward neural network trained according to the error backpropagation algorithm. It is one of the most widely used neural network models [42]. However, it is easy to fall into local minimum and result in low learning efficiency, which limits the application of BP in forest parameter estimation. The number of hidden nodes can significantly affect the prediction effect of BP [43]. The range of the parameter was set from 2 to 500 and the number of hidden nodes with minimum RMSE was determined for AGB estimation. The Support vector machine (SVM) can realize regression and classification through a variety of kernel functions, which is sparse and robust. However, when estimating forest parameters, it is complex to determine the specific kernel function and model parameters [31, 44]. And the linear kernel, polynomial kernels, and radial basis function in SVM were constructed to compare the validity of AGB estimation. The kNN uses the k nearest neighbors to represent the attributes of the samples to be tested, which is more suitable for the automatic classification of class domains with large sample sizes [45]. The Mahalanobis distance has been shown to be the most appropriate calculation metric of kNN for vegetation parameter estimation. In addition, the number of nearest neighbors k directly influences the prediction results of kNN [18]. In the study, k was set from 2 to 50, and the k achieving the minimum RMSE was used to determine the final kNN model parameters. Random forest is one of the ensemble algorithms, which constructs a large number of decision trees for prediction [39]. Nonparametric models have been proved to be effective in estimating vegetation parameters. However, for forest ecosystems, the estimation accuracy of these models when used independently is still limited [31]. Random forests have strong noise immunity and can effectively handle high-dimensional data, and have been shown to achieve satisfactory accuracy and robustness for vegetation parameter estimation such as leaf area index (LAI), growing stem volume (GSV), etc., [40]. The mtry and ntrees are the main parameter groups that affect the effect of RF modeling and estimation. mtry represents the bifurcation number of the constructed decision tree, and ntrees is the number of decision trees [39].

The stacking algorithm accomplishes model training by constructing and combining multiple base models, which often results in significantly superior prediction and generalization performance than a single model. First, stacking uses the base models for training and modeling to obtain predictions; then, the predictions from all base models are integrated as new training samples to obtain new predictions. The stacking can integrate and balance the outputs of all base models, which can effectively improve prediction accuracy and reduce estimation errors. To synthesize the advantages of the nonparametric models and improve the estimation accuracy, a stacking was proposed for AGB estimation in the study. The BP, SVM, kNN, and RF models were regarded as based models to estimate AGB, and then their predicted values were used as new training samples to conduct a new model (Fig. 4). The final prediction values were the prediction result of the stacking algorithm integrated with the base models. All the models were built and calculated in R 4.0.2.

Fig. 4
figure 4

The basic framework of the stacking method

Accuracy assessment

To verify the estimation effect of the model, seventy percent of the samples were randomly selected as training samples (70%, n = 3777) to build the model, and the remaining thirty percent (30%, n = 1619) were used for validation. The coefficient of determination R2 was used to measure the effect of fitting between the predicted and observed values and the root mean square error (RMSE) was used to calculate the estimation error of the models [46]. A larger R2 represents the better fitting between the observed value and the predicted value. The smaller the RMSE, the smaller the error of model estimation.

$${\mathrm{R}}^{2}=1-\frac{{\sum }_{i=1}^{n}{{(y}_{i}-{\widehat{y}}_{i})}^{2}}{{\sum }_{i=1}^{n}{{(y}_{i}-\overline{y })}^{2}},$$
(1)
$$\mathrm{RMSE}=\sqrt{\frac{{\sum }_{i=1}^{n}{{(\widehat{y}}_{i}-{y}_{i})}^{2}}{n}},$$
(2)

where \({y}_{i}\) is the measured AGB values, \(\widehat{{y}_{i}}\) is the estimated AGB values, and n is the sample size.

Results

Variable selection and AGB estimation

By ranking the importance of all variables extracted from ICESat-2, 98th percentile height achieved the highest ranking, indicating that 98th percentile height has a significant relationship with AGB. Followed by the maximum and 25th percentile height, however, 85th percentile height got the lowest importance. Figure 5 showed the partial importance ranking of spectral variables and the combination of spectral variables and LiDAR variables of ICESat-2 respectively. In addition, the red-edge vegetation index maintained a relatively high importance among the spectral variables. For the combination of variables in the spectral variables and the LiDAR variables, the LiDAR variables provided a higher importance ranking overall.

Fig. 5
figure 5

Partial importance ranking of the LiDAR variables extracted from ICESat-2

The RMSE of the estimation model based on spectral variables mainly ranged from 70 to 85 Mg/ha. However, after adding the LiDAR variables, RMSE had decreased significantly, ranging from 55 to 65 Mg/ha (Fig. 6). The results showed that when the number of variables were 10 and 26, the RMSE respectively achieved the smallest value and maintained stability.

Fig. 6
figure 6

RMSE change based on a spectral variables, b spectral variables and LiDAR variables

Comparison of the AGB estimation results

Estimation models were constructed by the variable selection results based on importance evaluation. The BP, SVM, kNN, RF, and stacking models were established for AGB estimation in Saihanba. Figure 7 showed the fitting effect of AGB estimation using Sentinel-2 only. The results of the BP, SVM, kNN, and RF were similar, and the determination coefficients were less than 0.3. However, after combining ICESat-2, the fitting effect of all models has been significantly improved (Fig. 8). Compared with only considering the spectral variables of Sentinel-2, adding the LiDAR variables extracted from ICESat-2 can significantly reduce the AGB estimation error. The stacking always achieved the highest R2 and the lowest RMSE. Compared with BP, SVM, kNN, and RF model, RMSE of stacking decreased by 25.7%, 29.3%, 25.1%, and 20.8% respectively when combining ICESat-2.

Fig. 7
figure 7

Scatter plots of the observed AGB against the predicted values by a BP, b kNN, c SVM, d RF and e stacking using the spectral variables (Mg/ha)

Fig. 8
figure 8

Scatter plots of the observed AGB against the predicted values by a BP, b kNN, c SVM, d RF and e stacking using the spectral variables and LiDAR variables extracted from ICESat-2 (Mg/ha)

Continuous AGB mapping

In order to obtain the continuous mapping, the AGB values predicted by the stacking algorithm were taken as the derivative results, and then integrate the continuous Sentinel-2 images for continuous mapping. The spectral variables extracted from Sentinel-2 were calculated for relative importance ranking, and the variable combination used for AGB mapping was determined. Figure 9 showed the continuous spatial distribution of AGB in Saihanba. The smallest AGB predicted values were distributed in the west of the study area, while the larger values were mainly distributed in the northeast and south. And the AGB values in the southeast were lower than that in the West. The predicted spatial pattern of AGB conformed to the actual distribution of Saihanba.

Fig. 9
figure 9

Continuous spatial distribution of AGB in Saihanba

Discussion

Variable selection methods for AGB estimation

The combination of variables can directly affect the estimation accuracy and operation efficiency of the model. Linear stepwise regression is one of the most commonly used variable selection methods, which can quickly screen out variables significantly linearly related to AGB [47]. However, due to the complexity and instability of the forest ecosystem, the linear method is limited in AGB estimation [48]. Importance evaluation provides a nonlinear variable selection process with greater potential than the linear method [49]. In order to verify and compare with the variable screening method based on importance evaluation, Pearson correlation coefficient was used to test the linear relationship between all variables and AGB, and the stepwise regression was used to screen the variable combination for establishing the linear model. Figure 10a showed that 98th percentile height achieved the highest correlation with AGB in LiDAR variables and showed a positive correlation of 0.59 (P < 0.01). And the RMSE of the linear regression model using spectral variables and the combination of spectral variables and LiDAR variables were 76.03 Mg/ha and 60.11 Mg/ha respectively, which was 23.6% and 24% higher than stacking respectively. In addition, the linear model also showed a significant decrease in RMSE after adding LiDAR variables (Fig. 10).

Fig. 10
figure 10

a Correlation coefficient matrix of LiDAR variables and AGB, and scatter plots of the observed AGB against the predicted values by linear regression using b the spectral variables, and c spectral variables and LiDAR variables (Mg/ha)

Uncertainty, limitations, and prospects

There are many uncertain factors such as remote sensing image types, modeling variables, estimation models that can affect AGB estimation, resulting in uncertainty and estimation error [10, 29, 31]. In our study, Sentinel-2 images were used for modeling and as a medium for continuous AGB mapping. Sentinel-2 carries more than three red bands that are extremely sensitive to forest chlorophyll changes, which is very effective for forest AGB estimation [32]. Due to cloud cover and revisit period, it is difficult to obtain high-quality Sentinel-2 images that are completely consistent with the measured time survey of the sample plot. More importantly, the reflectance obtained by Sentinel-2 in a single period may not accurately reflect the real forest conditions due to sensor errors and the influence of the atmosphere. In the study, the google earth engine (GEE) platform was used to obtain and preprocess Sentinel-2 images. And images during the growing season (from July to September) in 2017 with cloud cover of less than 5% were obtained. To ensure the stability and reliability of the images, median synthesis was performed for all pixels. Optical images can be used to extract band reflectance, vegetation index, and other variables used for AGB modeling and estimation [40]. To reduce invalid information and redundancy, we use the importance evaluation was used in the study to screen variables to improve the accuracy and efficiency of the model. In the study, ICESat-2 was used to extract canopy structure variables. The results showed that the inclusion of LiDAR-derived canopy structure variables can significantly improve the accuracy of AGB estimation model. LiDAR can penetrate the forest canopy to obtain the information of forest vertical structure, which can alleviate the saturation common with using optical imagery [25]. ICESat-2 is one of the latest launched spaceborne LiDAR, which can obtain global vegetation height parameters and has the potential for high-precision and high-efficiency monitoring of large-scale forests [27,28,29,30]. However, the segment diameter of 17 m may not be sufficient to accurately describe the ground object information inside the segment. Magruder et al. [50] demonstrated that the average effective diameter in White Sands Missile Range in New Mexico and along a segment of the 88°S line of latitude in Antarctica is about 10–11 m. For forest ecosystems with complex terrain and different vegetation types, the determined and specific effective diameter is very meaningful and needs to be verified. The Global Ecosystem Dynamics Investigation (GEDI) is another spaceborne LiDAR data source that can provide data and services. GEDI provides vertical canopy waveform information between 52°N and 52°S latitudes, which further complements the acquisition method of spaceborne LiDAR data [51]. Compared to ICESat-2, which provides a canopy height product, GEDI can acquire waveform data containing physical properties within the light patch, which allows for more efficient detection of different types of forests. However, the GEDI segments currently covering the same area are limited due to the short duration of satellite launches and revisit period. In addition, similar to ICESat-2, the along-track segments make it difficult to obtain wall-to-wall AGB spatial distribution. Spaceborne LiDAR such as ICESat-2 and GEDI has great potential in estimating global forest height, AGB, and carbon sink [52, 53], but how to synergize optical data and other spatial continuous environmental variables to achieve higher precision continuous mapping is the focus of attention in the future [54, 55].

In complex forest ecosystems, the relationship between remote sensing variables and AGB may not be simple linear, which limits the effect and application of the linear variable selection method [48, 49]. The nonlinear method based on importance evaluation has been proved to be significantly better than the linear method in our study. In order to test the influence of variables on the uncertainty of AGB estimation, the T-test was conducted to examine the relationship between modeling variables and absolute residuals. And the results showed that all variables and residuals were not statistically significant, indicating that variables selected by importance evaluation do not cause significant estimation error and uncertainty.

Parametric models and nonparametric models are commonly used AGB estimation models [10, 18]. The process of parameter model implementation is simple, but it is prone to overfitting and has low stability. Nonparametric models such as machine learning methods have gradually become popular in AGB estimation [17, 56]. However, the application of a single model in the complex forest ecosystem is always limited. Ensemble learning can summarize the advantages of all base models, thereby improving estimation efficiency and accuracy [57, 58]. Even if one base model gets the wrong prediction, other base models can correct the error to achieve better prediction [52]. Jiang et al. [31] demonstrated that it is feasible to construct stacking models for estimating forest canopy height with synergistic data sources of ICESat-2 and Sentinel-2. Generally, forest canopy height is used as an intermediate variable for forest AGB calculation through the allometric equations. In Saihanba, canopy height information extracted from the ATL08 product of ICESat-2 was used directly for AGB estimation, which can reduce the indirect transfer error. Moreover, the nonparametric models were used to construct the stacking, which avoided the overfitting easily caused by parametric models, especially MLR. And the overall accuracy of nonparametric methods was better than MLR in the study, which can effectively improve the accuracy of AGB estimation. Compared with the original base models, the RMSEs of stacking constructed by nonparametric models were reduced by 19% to 27%, which can significantly improve the estimation accuracy. Furthermore, the combination of Sentinel-2 variables and LiDAR variables for AGB estimation was emphasized in the study. Compared with using only spectral variables of Sentinel-2, the RMSE of the joint variables was reduced by 21.4%, which can make more full use of remote sensing information, so as to improve the estimation and mapping of AGB.

In addition, GEE can provide seamless optical data, such as Sentinel-2, Landsat, and MODIS, which are free and publicly available. Combining ICESat-2 with optical data and estimation models has the potential to obtain large-scale forest AGB efficiently [30, 59, 60].

Conclusions

LiDAR can penetrate forest canopy and obtain more accurate vertical structure information, which has the potential to improve AGB estimation. In this study, optical remote sensing data were used to synergize with spaceborne LiDAR data to realize the large-scale continuous spatial pattern of AGB mapping with high accuracy. ICESat-2 and Sentinel-2 were respectively acquired for extracting LiDAR variables and spectral variables. Nonparametric models and a stacking model were constructed to estimate AGB for comparison and validation. The results showed that the participation of the LiDAR variable can significantly improve the AGB estimation accuracy and reduce the error compared with using spectral variable only. The stacking model achieved the highest AGB estimation accuracy and the lowest RMSE, whose RMSE was reduced by 19% to 27% compared with the base models. In addition, the nonlinear variable selection method based on importance evaluation was proved to be better than the linear method in AGB estimation in Saihanba.

Availability of data and materials

The data are available upon a reasonable request to the Authors.

Abbreviations

AGB:

Above-ground biomass

ICESat-2:

The Ice, Cloud, and Land Elevation Satellite-2

LiDAR:

Light detection and ranging

ASR:

Apparent surface reflectance

GEE:

Google earth engine

SVM:

Support vector machine

kNN:

K-nearest neighbor

BP:

Back propagation

RMSE:

Root mean square error

References

  1. Seidl R, Eastaugh CS, Kramer K, Maroschek M, Hasenauer H. Scaling issues in forest ecosystem management and how to address them with models. Eur J Forest Res. 2013;132(5–6):653–66. https://doi.org/10.1007/s10342-013-0725-y.

    Article  Google Scholar 

  2. He HS, Hao ZQ, Mladenoff DJ, Shao GF, Hu YM, Chang Y. Simulating forest ecosystem response to climate warming incorporating spatial effects in north-eastern China. J Biogeogr. 2005;32(12):2043–56. https://doi.org/10.1111/j.1365-2699.2005.01353.x.

    Article  Google Scholar 

  3. Bonan GB. Forests and climate change: forcings, feedbacks, and the climate benefits of forests. Science. 2008;320(5882):1444–9. https://doi.org/10.1126/science.1155121.

    CAS  Article  Google Scholar 

  4. Lugo B. The storage and production of organic matter in tropical forests and their role in the global carbon cycle. Biotropica. 1982;14(3):161–87. https://doi.org/10.2307/2388024.

    Article  Google Scholar 

  5. Brown S. Measuring carbon in forests: current status and future challenges. Environ Pollut. 2002;116(3):363–72. https://doi.org/10.1016/S0269-7491(01)00212-3.

    CAS  Article  Google Scholar 

  6. Sandra B, Gillespie A, Lugo AE. Biomass estimation methods for tropical forests with applications to forest inventory data. For Sci. 1989;4:881–902. https://doi.org/10.1093/forestscience/35.4.881.

    Article  Google Scholar 

  7. Qin H, Cheng W, Xi X, Tian J, Zhou G. Estimation of coniferous forest aboveground biomass with aggregated airborne small-footprint lidar full-waveforms. Opt Express. 2017;25(16):A851. https://doi.org/10.1364/oe.25.00a851.

    Article  Google Scholar 

  8. Englhart S, Keuck V, Siegert F. Aboveground biomass retrieval in tropical forests—the potential of combined X- and L-band SAR data use. Remote Sens Environ. 2011;115(5):1260–71. https://doi.org/10.1016/j.rse.2011.01.008.

    Article  Google Scholar 

  9. Dong J, Kaufmann RK, Myneni RB, Tucker CJ, Kauppi PE, Liski J. Remote sensing estimates of boreal and temperate forest woody biomass: carbon pools, sources, and sinks. Remote Sens Environ. 2003;84(3):393–410. https://doi.org/10.1016/S0034-4257(02)00130-x.

    Article  Google Scholar 

  10. Gleason CJ, Im J. A review of remote sensing of forest biomass and biofuel: options for small-area applications. GISci Remote Sens. 2011;48:141–70. https://doi.org/10.2747/1548-1603.48.2.141.

    Article  Google Scholar 

  11. Zhao P, Lu D, Wang G, Wu C, Huang Y, Yu S. Examining spectral reflectance saturation in Landsat imagery and corresponding solutions to improve forest aboveground biomass estimation. Remote Sens. 2016;8(6):469. https://doi.org/10.3390/rs8060469.

    Article  Google Scholar 

  12. Bubier JL, Rock BN, Crill PM. Spectral reflectance measurements of boreal wetland and forest mosses. J Geophys Res. 1997;102(D24):29483–94. https://doi.org/10.1029/97JD02316.

    Article  Google Scholar 

  13. Myneni R, Maggion S, Iaquinta J, Privette J, Gobron N, Pinty B, Kimes D, Verstraete M, Williams D. Optical remote sensing of vegetation: modeling, caveats, and algorithms. Remote Sens Environ. 1995;51:169–88. https://doi.org/10.1016/0034-4257(94)00073-v.

    Article  Google Scholar 

  14. Huete A, Didan K, Miura T, Rodriguez EP, Gao X, Ferreira LG. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens Environ. 2002;83(1–2):195–213. https://doi.org/10.1016/S0034-4257(02)00096-2.

    Article  Google Scholar 

  15. Immitzer M, Atzberger C, Koukal T. Tree species classification with random forest using very high spatial resolution 8-band worldview-2 satellite data. Remote Sens. 2012;4:2661–93. https://doi.org/10.3390/rs4092661.

    Article  Google Scholar 

  16. Drusch M, Del Bello U, Carlier S, Colin O, Fernandez V, Gascon F, Hoersch B, Isola C, Laberinti P, Martimort P, Meygret A. Sentinel-2: ESAs optical high-resolution mission for gmes operational services. Remote Sens Environ. 2012;120:25–36. https://doi.org/10.1016/j.rse.2011.11.026.

    Article  Google Scholar 

  17. Hu Y, Xu X, Wu F, Sun Z, Xia H, Meng Q, Huang W, Zhou H, Gao J, Li W. Estimating forest stock volume in Hunan Province, China, by integrating in situ plot data, sentinel-2 images, and linear and machine learning regression models. Remote Sens. 2020;12:186. https://doi.org/10.3390/rs12010186.

    Article  Google Scholar 

  18. Verrelst J, Rivera JP, Veroustraete F, Muñoz-Marí J, Clevers JG, Camps-Valls G, Moreno J. Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods—a comparison. ISPRS J Photogramm Remote Sens. 2015;108:260–72. https://doi.org/10.1016/j.isprsjprs.2015.04.013.

    Article  Google Scholar 

  19. Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Moore R. Google earth engine: planetary-scale geospatial analysis for everyone. Remote Sens Environ. 2017;202:18–27. https://doi.org/10.1016/j.rse.2017.06.031.

    Article  Google Scholar 

  20. Dong J, Xiao X, Menarguez MA, Zhang G, Qin Y, Thau D. Mapping paddy rice planting area in northeastern ASIA with Landsat 8 images, phenology-based algorithm and google earth engine. Remote Sens Environ. 2016;185:142–54. https://doi.org/10.1016/j.rse.2016.02.016.

    Article  Google Scholar 

  21. Patel NN, Angiuli E, Gamba P, Gaughan A, Lisini G, Stevens FR. Multitemporal settlement and population mapping from landsat using google earth engine. Int J Appl Earth Obs Geoinf. 2015;35:199–208. https://doi.org/10.1016/j.jag.2014.09.005.

    Article  Google Scholar 

  22. Tang Z, Li Y, Gu Y, Jiang W, Xue Y, Hu Q. Assessing nebraska playa wetland inundation status during 1985–2015 using landsat data and google earth engine. Environ Monit Assess. 2016;188(12):654. https://doi.org/10.1007/s10661-016-5664-x.

    Article  Google Scholar 

  23. Minh D, Toan TL, Rocca F, Tebaldini S, D’Alessandro MM, Villard L. Relating p-band synthetic aperture radar tomography to tropical forest biomass. IEEE Trans Geosci Remote Sens. 2013;52(2):967–79. https://doi.org/10.1109/TGRS.2013.2246170.

    Article  Google Scholar 

  24. Imhoff ML. Radar backscatter and biomass saturation: ramifications for global biomass inventory. IEEE Trans Geosci Remote Sens. 1995;33(2):511–8. https://doi.org/10.1109/TGRS.1995.8746034.

    Article  Google Scholar 

  25. Luo S, Chen JM, Wang C, Xi X, Zeng H, Peng D. Effects of lidar point density, sampling size and height threshold on estimation accuracy of crop biophysical parameters. Opt Express. 2016;24(11):11578. https://doi.org/10.1364/OE.24.011578.

    Article  Google Scholar 

  26. Simard M, Pinto N, Fisher JB, Baccini A. Mapping forest canopy height globally with spaceborne lidar. J Geophys Res G: Biogeosci. 2011;116(G4):4021. https://doi.org/10.1029/2011JG001708.

    Article  Google Scholar 

  27. Abdalati W, Zwally HJ, Bindschadler R, Csatho B, Webb C. The ICESat-2 laser altimetry mission. Proc IEEE. 2010;98(5):735–51. https://doi.org/10.1109/JPROC.2009.2034765.

    Article  Google Scholar 

  28. Narine LL, Popescu SC, Malambo L. Synergy of ICESat-2 and landsat for mapping forest aboveground biomass with deep learning. Remote Sens. 2019;11:1503. https://doi.org/10.3390/rs11121503.

    Article  Google Scholar 

  29. Montesano PM, Rosette J, Sun G, North P, Nelson RF, Dubayah RO. The uncertainty of biomass estimates from modeled ICESat-2 returns across a boreal forest gradient. Remote Sens Environ. 2015;158:95–109. https://doi.org/10.1016/j.rse.2014.10.029.

    Article  Google Scholar 

  30. Li W, Niu Z, Shang R, Qin Y, Wang L, Chen H. High-resolution mapping of forest canopy height using machine learning by coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 data. Int J Appl Earth Obs Geoinformation. 2020;92: 102163. https://doi.org/10.1016/j.jag.2020.102163.

    Article  Google Scholar 

  31. Jiang F, Zhao F, Ma K, Li D, Sun H. Mapping the forest canopy height in Northern China by synergizing ICESat-2 with sentinel-2 using a stacking algorithm. Remote Sens. 2021;13:1535. https://doi.org/10.3390/rs13081535.

    Article  Google Scholar 

  32. Ahmadi K, Kalantar B, Saeidi V, Harandi EKG, Janizadeh S, Ueda N. Comparison of machine learning methods for mapping the stand characteristics of temperate forests using multi-spectral sentinel-2 data. Remote Sens. 2020;12:3019. https://doi.org/10.3390/rs12183019.

    Article  Google Scholar 

  33. Neuenschwander A, Pitts K. The ATL08 land and vegetation product for the ICESat-2 mission. Remote Sens Environ. 2019;221:247–59. https://doi.org/10.1016/j.rse.2018.11.005.

    Article  Google Scholar 

  34. Tamiminia H, Salehi B, Mahdianpari M, Quackenbush L, Adeli S, Brisco B. Google Earth Engine for geo-big data applications: a meta-analysis and systematic review. ISPRS J Photogramm Remote Sens. 2020;164:152–70. https://doi.org/10.1016/j.isprsjprs.2020.04.001.

    Article  Google Scholar 

  35. Xiao C, Peng L, Feng Z, Liu Y, Zhang X. Sentinel-2 red-edge spectral indices (RESI) suitability for mapping rubber boom in luang namtha province, northern Lao PDR. Int J Appl Earth Obs Geoinf. 2020;93: 102176. https://doi.org/10.1016/j.jag.2020.102176.

    Article  Google Scholar 

  36. Fernandez-Manso Q. Sentinel-2A red-edge spectral indices suitability for discriminating burn severity. Int J Appl Earth Obs Geoinf. 2016;50:170–5. https://doi.org/10.1016/j.jag.2016.03.005.

    Article  Google Scholar 

  37. Zhao M, Yang J, Zhao N, Liu Y, Yue T. Estimation of china’s forest stand biomass carbon sequestration based on the continuous biomass expansion factor model and seven forest inventories from 1977 to 2013. For Ecol Manage. 2019;448:528–34. https://doi.org/10.1016/j.foreco.2019.06.036.

    Article  Google Scholar 

  38. Li HK, Lei YC. Assessment of forest vegetation biomass and carbon storage in China. 2010 (ISBN: 978-7-5038-5809-3).

  39. Breiman L. Random forests. Mach Learn. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324.

    Article  Google Scholar 

  40. Jiang F, Kutia M, Ma K, Chen S, Long J, Sun H. Estimating the aboveground biomass of coniferous forest in Northeast China using spectral variables, land surface temperature and soil moisture. Sci Total Environ. 2021;785: 147335. https://doi.org/10.1016/j.scitotenv.2021.147335.

    CAS  Article  Google Scholar 

  41. Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2:18–22. https://doi.org/10.1023/A:1010933404324.

    Article  Google Scholar 

  42. Li B, Wang W, Bai L, Chen N, Wang W. Estimation of aboveground vegetation biomass based on landsat-8 oli satellite images in the guanzhong basin, China. Int J Remote Sens. 2019;40(9–10):3927–47. https://doi.org/10.1080/01431161.2018.1553323.

    Article  Google Scholar 

  43. Song S, Xiong X, Wu X, Xue Z. Modeling the SOFC by bp neural network algorithm. Int J Hydrogen Energy. 2021;46(38):20065–77. https://doi.org/10.1016/j.ijhydene.2021.03.132.

    CAS  Article  Google Scholar 

  44. Saunders C, Stitson MO, Weston J, et al. Support vector machine. Comput Sci. 2002;1(4):1–28. https://doi.org/10.1007/978-3-642-27733-7_299-3.

    Article  Google Scholar 

  45. Gjertsen A. Accuracy of forest mapping based on Landsat TM data and a kNN-based method. Remote Sens Environ. 2007;110:420–30. https://doi.org/10.1016/j.rse.2006.08.018.

    Article  Google Scholar 

  46. Willmott CJ, Ackleson SG, Davis RE, Feddema JJ, Klink KM, Legates DR, O’donnell J, Rowe CM. Statistics for the evaluation and comparison of models. J Geophys Res Space Phys. 1985;90:8995–9005. https://doi.org/10.1029/JC090iC05p08995.

    Article  Google Scholar 

  47. Long J, Lin H, Wang G, Sun H, Yan E. Estimating the growing stem volume of the planted forest using the general linear model and time series quad-polarimetric SAR images. Sensors. 2020;20:3957. https://doi.org/10.3390/s20143957.

    Article  Google Scholar 

  48. Mutanga O, Adam E. Cho MAHigh density biomass estimation for wetland vegetation using worldview-2 imagery and random forest regression algorithm. Int J Appl Earth Obs Geoinf. 2012;18:399–406. https://doi.org/10.1016/j.jag.2012.03.012.

    Article  Google Scholar 

  49. Chen Y, Li L, Lu D, Li D. Exploring bamboo forest aboveground biomass estimation using sentinel-2 data. Remote Sens. 2019;11:7. https://doi.org/10.3390/rs11010007.

    Article  Google Scholar 

  50. Magruder L, Brunt K, Neumann T, Klotz B, Alonzo M. Passive ground-based optical techniques for monitoring the on-orbit ICESat-2 altimeter geolocation and footprint diameter. Earth Space Sci. 2021;8:e2020EA001414. https://doi.org/10.1029/2020EA001414.

    Article  Google Scholar 

  51. Liu A, Cheng X, Chen Z. Performance evaluation of GEDI and ICESat-2 laser altimeter data for terrain and canopy height retrievals. Remote Sens Environ. 2021;264: 112571. https://doi.org/10.1016/j.rse.2021.112571.

    Article  Google Scholar 

  52. Silva CA, Duncanson L, Hancock S, Neuenschwander A, Thomas N, Hofton M, Fatoyinbo L, Simard M, Marshak CZ, Armston J. Fusing simulated GEDI, ICESat-2 and NISAR data for regional aboveground biomass mapping. Remote Sens Environ. 2021;253: 112234. https://doi.org/10.1016/j.rse.2020.112234.

    Article  Google Scholar 

  53. Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W. A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst. 2015;23(5):1638–54. https://doi.org/10.1109/TFUZZ.2014.2371479.

    Article  Google Scholar 

  54. Chen L, Ren C, Bao G, Zhang B, Wang Z, Liu M, Man W, Liu J. Improved object-based estimation of forest aboveground biomass by integrating LiDAR data from GEDI and ICESat-2 with multi-sensor images in a heterogeneous mountainous region. Remote Sens. 2022;14(12):2743. https://doi.org/10.3390/rs14122743.

    Article  Google Scholar 

  55. Nandy S, Srinet R, Padalia H. Mapping forest height and aboveground biomass by integrating ICESat-2, Sentinel-1 and Sentinel-2 data using Random Forest algorithm in northwest Himalayan foothills of India. Geophys Res Lett. 2021;48(14):e2021GL093799. https://doi.org/10.1029/2021GL093799.

    Article  Google Scholar 

  56. Zubler AV, Yoon JY. Proximal methods for plant stress detection using optical sensors and machine learning. Biosensors. 2020;10:193. https://doi.org/10.3390/bios10120193.

    CAS  Article  Google Scholar 

  57. Cui S, Yin Y, Wang D, Li Z, Wang Y. A stacking-based ensemble learning method for earthquake casualty prediction. Appl Soft Comput. 2021;101: 107038. https://doi.org/10.1016/j.asoc.2020.107038.

    Article  Google Scholar 

  58. Liu Y, Yao X. Ensemble learning via negative correlation. Neural Netw. 1999;12:1399–404. https://doi.org/10.1016/s0893-6080(99)00073-8.

    CAS  Article  Google Scholar 

  59. Coops NC, Tompalski P, Goodbody TRH, Queinnec M, Luther JE, Bolton DK, White JC, Wulder MA, van Lier OR, Hermosilla T. Modelling lidar-derived estimates of forest attributes over space and time: a review of approaches and future trends. Remote Sens Environ. 2021;260: 112477. https://doi.org/10.1016/j.rse.2021.112477.

    Article  Google Scholar 

  60. Mulverhill C, Coops NC, Hermosilla T, White JC, Wulder MA. Evaluating ICESat-2 for monitoring, modeling, and update of large area forest canopy height products. Remote Sens Environ. 2022;271: 112919. https://doi.org/10.1016/j.rse.2022.112919.

    Article  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments.

Funding

This research is supported by the Natural Science Foundation of China (31971578) and the Hunan Provincial Natural Science Foundation of China (2022JJ30078). This research is also supported by the Scientific Research Fund of Changsha Science and Technology Bureau (kq2004095).

Author information

Authors and Affiliations

Authors

Contributions

FJ and MD analyzed the results. JT and LF processed the origin data and conducted formal analysis. HS conceived the research idea and designed the experiments. All authors contributed to the manuscript writing and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hua Sun.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jiang, F., Deng, M., Tang, J. et al. Integrating spaceborne LiDAR and Sentinel-2 images to estimate forest aboveground biomass in Northern China. Carbon Balance Manage 17, 12 (2022). https://doi.org/10.1186/s13021-022-00212-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13021-022-00212-y

Keywords

  • Aboveground biomass
  • Carbon cycle and management
  • Remote sensing
  • ICESat-2
  • Google earth engine
  • Machine learning