### Study area

Perú is an ideal test case for investigating whether field plot sampling can accurately map carbon because its forests span a wide range of topographic, climatic, floristic, and geologic variables. Heterogeneity in aboveground carbon density (ACD) is driven by combinations of these variables, and higher ACD heterogeneity leads to less accurate field plot estimates of ACD [7]. Results from within the diverse environmental gradients of Perú can then be applied to other countries of similar conditions.

We used the 1-ha resolution aboveground carbon density map of Perú created by Asner et al. [6] as a model of the “true” carbon stocks of the country. We refer to this LiDAR-estimated carbon map throughout the paper as “the model carbon stock” or “the model aboveground carbon density (ACD).” The model ACD map was produced from a countrywide airborne LiDAR campaign with the Carnegie Airborne Observatory (CAO; cao.carnegiescience.edu). The LiDAR data were integrated with high-resolution satellite imaging, a large field plot sampling network (to calibrate/validate the LiDAR carbon mapping), and an advanced geospatial scaling algorithm [for more details see 6]. Please refer to Fig. S5 of [6] for model validation results, which compared 536,874 ha of LiDAR-measured forest TCH that were not used to train the model-estimated TCH of those same locations. The R^{2} is 0.78 and the RMSE is 3.50 m. While the underlying relationship between LiDAR measured-TCH and estimated ACD may be heteroskedastic, with a higher ACD errors found at taller TCH, the relative ACD uncertainty that results is low (i.e., <20 % for high ACD (>120 MgC ha^{−1}) lowland forests).

We masked this map to exclude areas that are not forested, using a mean per hectare threshold of >70 % photosynthetic vegetation and >2 m top-of-canopy height (TCH). The photosynthetic vegetation map was created from a national-scale mapping of Perú using the CLASlite algorithm with mosaicked Landsat satellite imagery [26]. The TCH map is the underlying dataset used to produce the ACD map of Perú [6]. This left a forested area of 76,457,286 ha for the analysis. While this is 59.5 % of Perú by land area, it contains 98.5 % of the total carbon of the entire country (6.798 Pg). All analyses were performed using R statistical software [27] with geospatial manipulations and analyses performed using the ‘raster’ package [28].

### Environmental stratification layers

We compiled a set of six co-aligned, 1-ha resolution environmental stratification layers (hereafter ‘strata’) comprised of topographic and climatic variables covering all of Perú. Cloudiness (%) was created from long-term (2000–2010) NASA moderate resolution imaging spectroradiometer (MODIS) data and described in further detail by [6]. Mean annual precipitation (MAP) was calculated from NASA Tropical Rainfall Measuring Mission (TRMM) 2b31 monthly data (1998–2011), and dry season length (DSL) was calculated using the same TRMM data for all months with less than 100 mm of precipitation. Elevation, slope, and a relative elevation model (REM) were created from NASA Shuttle Radar Topography Mission (SRTM) at 90 m resolution. Relative elevation is a proxy for vegetation related water resources, and is developed by calculating the height of the ground above nearest water body [29]. Each stratum was masked by the same forest mask described above.

### Field plot sampling designs

We tested two commonly used field plot sampling strategies: systematic sampling and stratified random sampling. This study is not meant to be an exhaustive review of spatial sampling techniques, rather to evaluate two methods that are commonly employed or recommended to map forest carbon over large geographic areas [9, 10, 12, 13]. Therefore, we did not consider other possible spatial sampling designs (e.g., simple random sampling, cluster sampling).

Systematic sampling uses sampling units (i.e., field inventory plots) placed at regular intervals according to an ordering scheme. For the country of Perú, we chose regularly spaced square grids with dimensions ranging from 5–100 km, increasing at 1-km intervals. Only those sampling units on the grid that fall within the bounds of the forest mask were retained (hereafter referred to as systematic grid sampling).

Stratified sampling places sampling units across a region according to pre-defined subregions that are more homogenous than the region as a whole, thereby reducing inherent sampling errors. The degree to which a stratified sampling technique accurately estimates the true population depends largely upon choosing homogenous subregions from which to conduct the subsampling [30]. We used unique combinations of the six strata described above to create these subregions (hereafter referred to as ‘substrata’). First each stratum was binned by quintile, creating 5 categorical classes for each stratum (Fig. 1b–g). Then a map of all unique combinations of the six quintile-binned strata was produced, and any resulting substrata less than 1000 ha in area was excluded (hereafter referred to as ‘quintile-binned substrata’). This resulted in 5398 unique substrata totaling 98.2 % of the forested area used for the analyses. We repeated the above steps, instead binning each stratum by 5 equal subsets of the total range of values of each stratum (hereafter referred to as ‘range-binned substrata’) (Additional file 1: Fig. S3). Again removing any substrata with less than 1000 ha, resulting in 447 substrata representing 99.9 % of the forested area used for the analyses. Sampling units were then randomly selected from within each substratum according to the simulations described below (hereafter referred to as ‘stratified random sampling’).

### Estimating total national carbon stocks

#### Systematic grid

For each systematic sampling grid, the ACD value extracted from the centroid of each sampling grid cell was used as the ACD value for all of the 1-ha values in that sampling grid cell. The total number of grid cells in a systematic grid represents the number of field plots needed to create the estimated ACD map. We produced 96 different estimated ACD maps, each with a spatial resolution corresponding to the sampling grid dimensions used to create the map (5–100 km in 1 km increments). The ACD was summed across the entire country for each of the different estimated ACD maps, and compared to the total of the model ACD map.

#### Stratified random

We used Monte Carlo simulations to determine the probability of a randomly placed set of field inventory plots within each quintile-binned substratum to accurately estimate the total carbon stocks of Perú. For all of the 5398 substrata, a random sample of ACD values drawn from all ACD values of the substratum was selected and used to estimate the mean ACD of the substratum. The mean ACD of each substratum was multiplied by the total number of hectares in the substratum to get the estimated total ACD of that substratum. All estimates of total ACD from the substrata were then summed to get an estimated total ACD of the country, and compared to the total of the model ACD map. This was repeated 5000 times and the probability of correctly estimating the total ACD of Perú to within 10 % was determined. This simulation was run for sample sizes of 1 through 100 field plots in each substratum. We repeated this same simulation, but at each iteration progressively removing the smallest substrata (by area) in increments of 5 %. We used the mean ACD of the sampled substrata to estimate the total ACD of the substrata that were removed from the simulation. We did not test the range-binned substrata because of its poor performance in estimating the mean ACD of a substratum (see ‘‘Uncertainty simulations and extraction’’ section below).

### Estimating national carbon geographies

#### Systematic grid

Each 1-ha resolution systematic grid estimated ACD map (see above) represents a stratify-and-multiply upscaling approach whereby the sampled value from the systematic grid (which can be thought of as stratification in this context) is applied to the corresponding unsampled population (the rest of the 1-ha values in the systematic grid cell). Each estimated ACD map was co-aligned with the model ACD map of Perú, subtracted from the model ACD map, and the difference divided by the model ACD map to get the relative percent difference on a per hectare basis. We also performed a model-linked upscaling approach using random forest machine learning in addition to the stratify-and-multiply upscaling approach. We used the systematic sampling grid values as the training dataset for each random forest model. For the predictors of the model, we used the same 12 contiguous remote sensing layers that were used to create the model ACD map [see 6]. For each systematic grid, a random forest model was fit with the sampled values and their co-aligned values from the remote sensing layers. Each fitted model was then applied to the entire country using the 12 remote sensing layers to create a wall-to-wall random forest estimated ACD map [for more details on the random forest approach see [31]. This map was then compared to the model ACD map in the same manner as above.

#### Stratified random

We chose a subset {1, 5, 10, 50, 100} of field plot sets used to randomly sample each substratum to create a stratified random estimated ACD map using a stratify-and-multiply approach. For each substratum, a field plot set was randomly drawn from all ACD values of the substratum, and the mean of the set was mapped back onto all hectares of that substratum. This produced a set of maps of estimated ACD for the entire country of Perú at 1-ha resolution based on the stratified random sampling approach. For each map created from one of the five field plot sets, the stratified random estimated ACD map was subtracted from the model ACD map, and the difference divided by the model ACD map to get the relative percent difference. Again, we did not test the range-binned strata for the reason described in the preceding section. We also applied a model-linked upscaling approach using the random forest machine learning algorithm. Here the training datasets were composed of the stratified random sample values and their co-aligned values from the 12 remote sensing layers. Random forest models were fit and applied in the same way as described above.

### Uncertainty simulations and extractions

We ran Monte Carlo simulations to find the number of field plots it would take to accurately estimate the mean ACD of a substratum for the stratified random sampling approach. For each substratum, a random sample of ACD values (representing field plots) was selected from all possible ACD values of the substratum and the mean of the sample was compared to the mean of all ACD values of the substratum. This was repeated 5000 times to find the probability that the selected number of field plots would produce an estimate accurate to within 10 % of the mean of the substratum as estimated by the model ACD map. This simulation was run for sample sizes of 1 through 100 field plots. We then mapped the carbon value (either mean or total) and the environmental strata value of each substratum onto the results of these simulations to examine potential patterns.

We co-aligned the spatially explicit maps of estimated ACD created from both sampling approaches with the original (non-binned) strata. For each sampling approach, we created a median estimated ACD map across all of the sampling grids (systematic sampling, 5–100 km) and plot sets (stratified random, {1, 5, 10, 50, 100} plots). We then extracted the percent relative difference between the median estimated ACD map and the model ACD map. We also extracted the six climatic and topographic values associated with each mapped hectare.