Welcome to the `neonSoilFlux`

package! This vignette will
guide you through the process of using this package to acquire and
compute soil CO\(_{2}\) fluxes at
different sites in the National Ecological Observatory Network.

You can think about this package working in two primary phases:

- acquiring the environment data for a given month at a NEON site
(
`acquire_neon_data`

). This includes:- Soil temperature at different depths.
- Soil water content at different depths.
- Soil CO\(_{2}\) concentration.
- Atmospheric pressure
- Soil properties (bulk density, others)

- Given those properties, computing the soil surface fluxes and the
associated uncertainty using a variety of methods to compute fluxes
(
`compute_neon_flux`

).

We split these two functions in order to optimize time and that both
were fundamentally different processes. Acquiring the NEON data makes
use of the `neonUtilities`

package.

This package takes the guess work out of which data products to
collect, hoping to reduce the workflow needed. We rely very much on the
`tidyverse`

philosophy for computation and coding here.

Load up the relevant libraries:

Let’s say we want to acquire the NEON soil data at the
`SJER`

site during the
month June in 2021:

The output `out_env_data`

for
`acquire_neon_data`

is a list of lists:

- The first element is
`site_data`

, a nested data frame containing measurements for the required flux gradient model during the given time period. - The second element is
`site_megapit`

, a nested frame containing specific information about soils at the site (for bulk density calculations, etc)

Two required inputs are needed to run the function acquire_neon_data:

- NEON site name (a four digit code standard by NEON)
- Download date, a string in the YYYY-MM format
- Optional arguments include
`time_frequency`

, which is 30 minutes (the default) or the 1 minute data (currently untested) and if we download provisional NEON data.

As the data are acquired various messages from the
`loadByProduct`

function from the `neonUtilities`

package are shown - this is normal. Products are acquired from each
spatial location (`horizontalPosition`

) or vertical depth
(`verticalPosition`

) at a NEON site

Outputs for `acquire_neon_data`

are two nested data
frames:

`site_data`

This contains three variables: the measurement name (one of`soilCO2concentration`

,`VSWC`

(soil water content),`soilTemp`

(soil temperature), and`staPres`

(atmospheric pressure)),`monthly_mean`

contains the mean value of the measurement at each horizontal and vertical depth. We compute the monthly mean using a bootstapped technique.`data`

which contains the stacked variables acquired from neonUtilities - the horizontal and vertial positions, timestamp (in UTC), associated values, the QF flag (0 = pass, 1 = fail, LINK)`site_megapit`

: the nested data frame of the soil sampling data, found here (LINK). This data table is essential what is reported back from acquiring the data product from NEON.

For each data product, the `acquire_neon_data`

function
also performs two additional checks:

- The soil water content data product requires some additional
calibration to correct both the soil sensor depth and calibration in the
function
`swc_correct`

. Information about regarding this correction is found here: LINK. Once updated sensors are installed in the future we will depreciate this function. - The actual measurement depth (in meters) is extracted for each position.
- The monthly mean for each measurement at each depth is computed, described below.

The monthly mean is utilized when a given measurement fails final QF
checks. This function is provided by code
from Zoey Werbin. For a
location (`horizontalPosition`

) given depth and A monthly
mean is computed when there are at least 15 days of measurements. Assume
you have a vector of measurements \(\vec{y}\), standard errors \(\vec{\sigma}\), and expanded uncertainty
\(\vec{\epsilon}\) (all of length \(M\)) that passes the QF checks in a given
month. The expanded uncertainty \(\vec{\epsilon}\) is generated by NEON to be
include the 95%
confidence interval. We have that \(\vec{\sigma}_{i}\leq\vec{\epsilon}_{i}\).
We define the bias \(\vec{b}=\sqrt{\left(\vec{\epsilon}\right)^{2}-\left(\vec{\sigma}\right)^{2}}\)
to be the quadrature difference between the expanded uncertainty and the
standard error.

We generate a bootstrap sample of the mean \(\overline{y}\) and standard error \(\overline{s}\) the following ways. Here we set the number of bootstrap samples \(N\) to be 5000. Entries for \(\overline{y}_{i}\) and \(\overline{s}_{i}\) are determined by the following:

- Randomly sample from the uncertainty and bias independently: \(\vec{\sigma}_{j}\) and the bias \(\vec{b}_{k}\) (not necessarily the same sample)
- Generate \(N\) random samples from
a normal distribution with mean \(\vec{y}\) and standard deviation \(\vec{\sigma}_{j}\). Since \(M<N\),
`R`

will recycle the vector \(\vec{y}\) so that this sample is of length \(M\). We will call the sample of \(\vec{y}\) as \(\vec{x}\). - With these \(N\) random samples, \(\overline{y}_{i}=\overline{\vec{x}}+\vec{b}_{k}\) and \(s_{i}\) is the sample standard deviation of \(\vec{x}\). We expect that \(s_{i} \approx \vec{\sigma}_{j}\).

Once that is complete, the reported monthly mean and standard deviation is \(\overline{\overline{y}}\) and \(\overline{s}\).

With the resulting output from `acquire_neon_data`

, you
can then unnest the different data frames to make plots, for
example:

Once we have `out_env_data`

from
`acquire_neon_flux`

, we then compute the fluxes at this
site:

```
out_fluxes <- compute_neon_flux(input_site_env = out_env_data$site_data,
input_site_megapit = out_env_data$site_megapit
)
```

The resulting data frame `out_fluxes`

has the following
variables:

`startDateTime`

: Time period of measurement (as POSIXct)`horizontalPosition`

: Sensor location where flux is computed`flux_compute`

: A nested tibble with variables (1)`flux`

,`flux_err`

, and`method`

(one of 4 implemented)`diffusivity`

: Computation of surface diffusivity`VSWCMeanQF`

: QF flag for soil water content across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF fail`soilTempMeanQF`

: QF flag for soil temperature across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF fail`soilCO2concentrationMeanQF`

: QF flag for soil CO2 concentration across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF fail`staPresMeanQF`

: QF flag for atmospheric pressure across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF fail

A QF measurement fails when there is a monthly mean could not be
computed for a measurement. Note that this would cause
**all** flux calculations to fail at that given horizontal
position.

You can see the distribution the QF flags for each environmental
measurement with `env_fingerprint_plot`

:

- “Pass” means that for the given timepoint, the monthly mean was not used or the sensor was not offline. This is the highest quality measurement.
- “Monthly Mean” means that for the given timepoint the measurement value was replaced by the monthly mean.
- “Fail” means that no measurement was available. This occurs if there is not sufficient data to compute the monthly mean. When a measurement fails it usually will be for the entire month.

Similarly, you can see the distribution of QF flags for each
diffusivity and flux computation with
`flux_fingerprint_plot`

:

- “Pass” means that for the given timepoint, the computed flux measurement was not NA or positive (the sign of the derived flux conformed to expectations). Monthly means could be used in the computation.
- “Fail” means that the flux was not computed. This occurs if there is not sufficient data to compute the monthly mean (one environmental measurement was “Fail”), or the computed flux was negative.