This vignette will demonstrate the main functionality of
`remotePARTS`

by working through a real remote sensing data
set. The example follows the more-detailed development of the methods in
Ives et al. (2021, Remote Sensing of Environment, https://doi.org/10.1016/j.rse.2021.112678).

First, install/update `remotePARTS`

from github if
needed:

`devtools::install_github("morrowcj/remotePARTS")`

Then, ensure that the package is loaded into your library:

This vignette will use `dplyr`

and `ggplot2`

for visualizing the data:

`remotePARTS`

ships with one spatial data object. This
dataset contains NDVI values derived from NASA’s MODIS satellite for the
US State of Alaska. The first object, `ndvi_AK10000`

, due to
package size limitations, this dataset is a random sampling of 10,000
pixels from the full Alaska dataset. It is important to note, though
that `remotePARTS`

can handle the full map (and much larger
maps).

For this vignette, we well also create a smaller 3000 pixel subsample
`ndvi_AK3000`

for demonstrative purposes:

```
data("ndvi_AK10000")
ndvi_AK3000 <- ndvi_AK10000[seq_len(3000),] # first 3000 pixels from the random 10K
```

`ndvi_AK10000`

is a `data.frame`

with 37
columns. `lng`

and `lat`

are longitude and
latitude, respectively. `AR_coef`

and `CLS_coef`

are pre-calculated coefficient estimates of the time trends in ndvi from
pixel-level time series analyses via AR REML and conditional least
squares, respectively. These coefficient estimates have been
standardized by the mean ndvi value for the pixel over that time period.
`land`

is a factor representing land-cover classes, The
remaining 32 columns, of the form `ndviYYYY`

, contain the
NDVI values from 1982 to 2013. These data sets already have rare land
classes, that occur in less than 2% of pixels, removed. Additionally,
both sets have no missing data. `remotePARTS`

can
**not** handle any missing data. It is essential that
**all** missing data are removed prior to conducting any
analyses.

```
str(ndvi_AK10000)
## 'data.frame': 10000 obs. of 37 variables:
## $ lng : num -151 -145 -144 -163 -145 ...
## $ lat : num 63.3 64.7 62.7 67.7 65.8 ...
## $ AR_coef : num 0.01071 0.003586 0.002488 0.000373 0.002505 ...
## $ CLS_coef: num 0.00529 0.001537 0.002276 0.000476 0.001514 ...
## $ land : Factor w/ 10 levels "Evergr needle",..: 6 7 7 6 7 7 6 6 7 7 ...
## $ ndvi1982: num 1.88 7.68 8.46 6.85 10.25 ...
## $ ndvi1983: num 1.74 8.14 7.99 6.97 10.35 ...
## $ ndvi1984: num 1.96 8.47 8.04 6.67 10.29 ...
## $ ndvi1985: num 1.74 7.72 7.96 6.9 10.02 ...
## $ ndvi1986: num 1.92 8.31 8.64 6.43 10.39 ...
## $ ndvi1987: num 2.04 8.89 8.4 7 10.47 ...
## $ ndvi1988: num 2.13 8.5 8.32 7.12 10.05 ...
## $ ndvi1989: num 1.93 8.99 8.79 6.24 10.16 ...
## $ ndvi1990: num 1.73 8.66 8.45 7.46 10.31 ...
## $ ndvi1991: num 1.99 8.74 8.34 6.86 10.28 ...
## $ ndvi1992: num 1.74 7.99 7.94 6.53 9.77 ...
## $ ndvi1993: num 1.98 8.62 7.93 6.97 10.35 ...
## $ ndvi1994: num 2.04 8.59 8.26 7.15 10.25 ...
## $ ndvi1995: num 1.9 8.1 8.55 7.08 10.18 ...
## $ ndvi1996: num 2.09 8.61 8.23 7.08 10.33 ...
## $ ndvi1997: num 1.81 8.77 8.99 7.24 10.23 ...
## $ ndvi1998: num 1.95 7.9 8.57 7.45 9.99 ...
## $ ndvi1999: num 1.75 8.14 8.61 6.75 10.04 ...
## $ ndvi2000: num 1.71 7.5 8.37 6.7 9.92 ...
## $ ndvi2001: num 1.57 7.76 8.29 6.66 9.69 ...
## $ ndvi2002: num 1.72 7.83 8.62 7.09 9.76 ...
## $ ndvi2003: num 1.95 7.94 8.35 6.87 9.76 ...
## $ ndvi2004: num 2.2 8.37 8.83 7.3 11.74 ...
## $ ndvi2005: num 2.18 8.45 8.63 7.18 11.62 ...
## $ ndvi2006: num 2.06 8.32 8.36 6.74 11.59 ...
## $ ndvi2007: num 2.39 8.53 8.63 7.47 11.91 ...
## $ ndvi2008: num 2.36 8.6 8.41 7 11.8 ...
## $ ndvi2009: num 2.35 10.02 9.27 6.59 11.62 ...
## $ ndvi2010: num 2.95 11.09 9.26 7.09 11.46 ...
## $ ndvi2011: num 2.5 8.9 8.94 7.05 12.15 ...
## $ ndvi2012: num 2.71 8.71 8.39 6.21 10.97 ...
## $ ndvi2013: num 2.54 9 8.95 6.93 10.38 ...
```

For this demonstration, we are interested in asking the following questions using these data: “Is NDVI in Alaska increasing over time?”; “Are Alaska’s NDVI time trends associated with land-cover classes?”; and “Do Alaska’s NDVI time trends differ with latitude?”

The figure below shows a temporal cross-section of these data for 1982, 1998, and 2013.

```
reshape2::melt(ndvi_AK10000, measure = c("ndvi1982", "ndvi1998", "ndvi2013")) %>%
ggplot(aes(x = lng, y = lat, col = value )) +
geom_point(size = .1) +
labs(col = "ndvi") +
facet_wrap(~ gsub("ndvi", "", variable), ncol = 3) +
scale_color_viridis_c(option = "magma") +
labs(x = "Longitude", y = "Latitude")
```

The following figure shows how Alaska’s three primary land-cover classes are distributed.

```
ndvi_AK10000 %>%
ggplot(aes(x = lng, y = lat, col = land)) +
geom_point(size = .1) +
scale_color_viridis_d(direction = -1, end = .9) +
labs(y = "Latitude", x = "Longitude", col = "Land cover", fill = "Land cover")
```

Use `help("ndvi_AK)`

to see documentation for these
datasets.

When using `remotePARTS`

, the data are assumed to follow
the general stochastic process of the form

\[y(t) = X(t)\beta + \varepsilon(t)\] where

\(y(t)\) is a response variable of interest at time \(t\)

\(\beta\) is a vector of coefficients for the predictor variables \(X(t)\) on \(y(t)\)

\(\varepsilon(t)\) is an temporal autoregressive process: \(\varepsilon(t) = \rho \varepsilon(t - 1) + \delta(t)\)

at any time step, \(\delta(t)\) is spatially autocorrelated according to covariance matrix \(\Sigma\): \(\delta(t) \sim N(0, \Sigma)\)

there is no temporal dependence in \(\delta(t)\) (i.e., \(\delta(t_i)\) is independent of \(\delta(t_j)\) except when \(t_i\) = \(t_j\))

The first step in a typical `remotePARTS`

workflow is to
obtain pixel-level estimates of time-series coefficients. In our
example, we are interested in estimating the time trends in NDVI for
each pixel \(i\), represented by \(\beta_1\) in the regression model

\[y_i(t) = \beta_0 + \beta_1 t + \varepsilon_i(t)\]

where the random errors \(\varepsilon_i(t)\) follow an AR(1) process:

\[\varepsilon_i(t) = b\varepsilon_i(t - 1) + \delta_i(t)\] \[\delta_i(t) \sim N(0 , \sigma)\]

We will use `fitAR_map()`

to estimate \(\beta_1\), which fits pixel-level AR(1)
models to a map of pixels and estimates coefficients using restricted
maximum likelihood (REML). To do so, we must extract only our NDVI
columns as the matrix `Y`

. We’ll do this by matching all
column names containing “ndvi” and slicing the data.frame:

```
ndvi.cols <- grep("ndvi", names(ndvi_AK3000), value = TRUE)
Y <- as.matrix(ndvi_AK3000[, ndvi.cols])
```

We also need a 2-column coordinate matrix `coords`

:

`Y`

and `coords`

are then passed to
`fitAR_map()`

with default settings:

Coefficient estimates can be obtained from `ARfit`

with
`coefficients()`

. The first column is the estimate of \(\beta_0\), \(\hat{\beta_0}\), and the second is \(\hat{\beta_1}\).

```
head(coefficients(ARfit))
## (Intercept) t
## [1,] 1.703446 0.021923615
## [2,] 7.980618 0.030464243
## [3,] 8.146784 0.021130056
## [4,] 6.883592 0.002582853
## [5,] 10.081968 0.026466877
## [6,] 7.628119 -0.010395964
```

These time-series analyses calculate the time trend in the raw NDVI
data. In most situations it makes sense to ask if there are time trends
in the relative NDVI values, that is, changes in NDVI relative to the
mean value of NDVI in a pixel. Scaling the trend in NDVI relative to the
mean gives assessments of the **proportional** change in
NDVI. These trends in the proportional NDVI are calculated be dividing
\(\hat{\beta_1}\) by the mean. The
values of the trend coefficients are contained in
`ARfit$coefficients`

, and since the coefficients for the
trend are in the column of the coefficients matrix named `t`

,
the scaling is performed as

```
ARfit$coefficients[, "t"] <- ARfit$coefficients[,"t"]/rowMeans(ndvi_AK3000[, ndvi.cols])
ndvi_AK3000$AR_coef <- coefficients(ARfit)[, "t"] # save time trend coefficient
```

These scaled values of the time trend are then stored in the
`ndvi_AK3000`

data frame.

Below is an image of the estimated coefficients (pre-calculated and
scaled) for the full `ndvi_AK10000`

. From this, it appears
that northern latitudes may be greening faster than more southern
latitudes.

```
ndvi_AK10000 %>%
ggplot(aes(x = lng, y = lat, col = AR_coef)) +
geom_point(size = .1) +
scale_color_gradient2(high = "red", low = "blue",
mid = "grey90", midpoint = 0) +
guides(fill = "none") +
labs(y = "Latitude", x = "Longitude", col = expression(beta[1]))
```

`fitAR_map`

and its conditional least-squares counterpart,
`fitCLS_map`

, are wrappers for the functions
`fitAR`

and `fitCLS`

which conduct individual time
series analysis. If the user wants, these can be applied on a
pixel-by-pixel basis to the data to allow greater flexibility. Both AR
REML and CLS methods account for temporal autocorrelation in the time
series. See function documentation for more details (i.e.,
`?fitAR_map()`

).

Now that we’ve reduced the temporal component of each time series to a single value (i.e., estimates of \(\beta_1\)) while accounting for temporal autocorrelation, we can focus on the spatial component of our problem.

The first step is to calculate the distances among pixels as a
distance matrix `D`

. Here, we’ll calculate relative distances
with `distm_scaled()`

from our coordinate matrix.

```
D <- distm_scaled(coords)
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
```

`distm_scaled()`

scales distances across the spatial
domain so that the greatest distance between two pixels is 1. Note that
because `distm_scaled()`

uses
`geosphere::distm()`

, it treats coordinates as degrees
longitude and latitude on a sphere and calculates distances
accordingly.

Next, we need to estimate the expected correlation between the random
errors of our spatial response variable (estimates of \(\beta_1\)) based on their distances. To do
so, we need a spatial covariance function. In this example, we will use
an exponential covariance function to estimate correlations: \(V = \exp\big(\frac{-D}{r}\big)\) where
\(r\) is a parameter that dictates the
range of spatial autocorrelation. The function `covar_exp()`

corresponds to this covariance function.

`V`

represents the correlation among points if all
variation is accounted for. However, it is safest to assume that there
is some additional source of unexplained and unmeasured variance (a
nugget \(\eta\)). Therefore, we assume
that the covariance structure among pixels is given by \(\Sigma = \eta I + (1-\eta)V\) where \(I\) is the identity matrix.

If we know the range parameter \(r\), we can calculate `V`

from
`D`

with `covar_exp()`

:

And we could add a known nugget to `V`

to obtain
`Sigma`

:

See `?covar_exp()`

for a description of the covariance
functions provided by `remotePARTS`

and for more information
regarding their use.

To test spatial hypotheses with `remotePARTS`

, we use a
generalized least-squares regression model (GLS):

\[\theta = X\alpha_{gls} + \gamma \] where

\(\theta\) is a vector of response values

\(\alpha_{gls}\) is a vector of the effects of predictor variables \(X\) on \(\theta\)

the error term \(\gamma\) is spatially autocorrelated according to \(\Sigma_\gamma\): \(\gamma \sim N(0, \Sigma_\gamma)\)

\(\theta\) will usually be a regression parameter. For example, if we’re interested in understanding trends in NDVI over time, we would use the pixel-level regression coefficient for the effect of time on NDVI (i.e., \(\theta = \hat\beta\))

If the parameters that govern the spatial autocorrelation \(\Sigma_\gamma\) are known, a GLS can be fit
with `fitGLS()`

. Here, we will fit the GLS by providing (i) a
model formula, (ii) a data source, (iii) our `V`

matrix,
which was pre-calculated with a spatial parameter \(r = 0.1\) and (iv) a nugget of \(\eta = 0.2\).

The specific task in this examples is to estimate the effect of
land-cover class on our time trend. Because `land`

is a
factor, we’ll also specify a no-intercept model.

Note that the GLS fitting process requires an inversion of
`V`

. This means that even with only the 3000-pixel subset, it
will take a few minutes to finish the computations on most
computers.

Note also that `fitGLS`

adds the nugget to `V`

internally. If we wanted to do this ourselves, we could pass the
covariance matrix `Sigma`

which already contains a nugget
component and then set the `nugget`

argument of
`fitGLS`

to 0:

The estimates for our land class effects can be extracted with
`coefficients()`

.

```
coefficients(GLS.0)
## landShrubland landSavanna landGrassland
## 0.0008646565 0.0002554862 0.0011294576
```

The full model fit is given by

```
GLS.0
##
## Call:
## fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V,
## nugget = nugget)
##
## t-tests:
## Est t.stat pval.t
## landShrubland 0.0008647 0.5065 0.6126
## landSavanna 0.0002555 0.1493 0.8813
## landGrassland 0.0011295 0.6648 0.5062
##
## F-test:
## model df_F SSE MSE logLik Fstat pval_F
## 1 AR_coef ~ 0 + land 2 0.1128 3.764e-05 12808 4.461 0.01163
## 2 AR_coef ~ 1 2997 0.1131 3.775e-05 12802 NA NA
```

Note that, although none of the three land-cover classes shows a statistically significant time trend in (proportional) NDVI, the F-test shows that there is a statistically significant difference among land-cover classes, because the model including land-cover classes gives a statistically significantly better fit to the data than the intercept-only model.

In practice, we rarely know the values of the parameters that govern spatial autocorrelation (e.g., the range and nugget) in advance. Therefore, these parameters will need to be estimated for most data.

The spatial parameters of a covariance function (e.g.,
`covar_exp`

) can be estimated from residuals of pixel-level
time-series models (see Ives et al. RSE, 2021). Although we conducted
the time-series analyses as though each pixel was independent (with
`fitAR_map()`

), they are, in fact, dependent. Specifically,
the correlation of the residuals from the pixel-level analyses is
roughly proportional to the spatial autocorrelation of the residuals of
the spatial model, \(\gamma\), if all
of the variation in \(\gamma\) is due
to the spatiotemporal variation produced by \(\varepsilon_i(t)\). Therefore, we can
estimate the range parameter for **V** as \(V_{ij} \approx \text{cor}\big(\varepsilon_i(t),
\varepsilon_j(t)\big)\)

The function `fitCor()`

performs the estimation of spatial
parameters. We will pass to this function (i) the time-series residuals
for our map, extracted from the time-series analysis output object with
`residuals()`

, (ii) the coordinate matrix
`coords`

, (iii) the covariance function
`covar_exp()`

, and (iv) a list specifying that we should
start the search for the optimal range parameter at 0.1. For this
example, we will also specify `fit.n = 3000`

, which ensures
that all pixels are used to estimate spatial parameters.

```
corfit <- fitCor(resids = residuals(ARfit), coords = coords, covar_FUN = "covar_exp",
start = list(range = 0.1), fit.n = 3000)
(range.opt = corfit$spcor)
## range
## 0.1083729
```

By default, `fitCor()`

uses `distm_scaled()`

to
calculate distances from the coordinate matrix, but any function that
returns a distance matrix can be specified with the
`distm_FUN`

argument. It is important to scale the parameter
values appropriately, accounting for your distances. For example, if we
instead use `distm_km()`

to calculate distance in km instead
of relative distances, we would need to scale our starting range
parameter by the maximum distance in km of our map:

```
max.dist <- max(distm_km(coords))
corfit.km <- fitCor(resids = residuals(ARfit), coords = coords, covar_FUN = "covar_exp",
start = list(range = max.dist*0.1), distm_FUN = "distm_km", fit.n = 3000)
```

Note that, depending on the covariance function used, not all
parameters will need scaling. For example, `covar_exppow()`

is an exponential-power covariance function and takes a range and shape
parameter, but only the range parameter should scale with distance. See
`?fitCor()`

for more details.

After we’ve obtained our range parameter estimate, we can use it to
re-calculate the `V`

matrix:

Similar to finding the optimal spatial parameters, the nugget can be
estimated by selecting a nugget that maximizes the likelihood of the GLS
given the data. `fitGLS()`

will find this maximum-likelihood
nugget when `nugget = NA`

is specified. Note that this type
of optimization requires fitting multiple GLS models, which means it
will be much slower than our call to `fitGLS()`

with a known
nugget.

In addition to our original arguments, we’ll also explicitly set
`no.F = FALSE`

so that F-tests are calculated. For the
F-tests, the default reduced model is the intercept-only model, although
it is also possible to specify alternative reduced models as a formula
in the `formula0`

option.

```
GLS.opt <- fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V.opt, nugget = NA, no.F = FALSE)
(nug.opt = GLS.opt$nugget)
## [1] 0.1342314
coefficients(GLS.opt)
## landShrubland landSavanna landGrassland
## 0.0009201230 0.0003899361 0.0011467324
```

Let’s compare our GLS from earlier with this one with optimized parameters:

```
rbind(GLS.0 = c(range = r, nugget = GLS.0$nugget, logLik = GLS.0$logLik, MSE = GLS.0$MSE),
GLS.opt = c(range = range.opt, nugget = GLS.opt$nugget, logLik = GLS.opt$logLik, MSE = GLS.opt$MSE))
## range nugget logLik MSE
## GLS.0 0.1000000 0.2000000 12807.91 3.763878e-05
## GLS.opt 0.1083729 0.1342314 12809.46 5.003165e-05
```

Note that in this example, `logLik`

for
`GLS.opt`

is not functionally different than
`logLik`

for `GLS.0`

. This indicates that using
the values of `range`

= 0.1 and `nugget`

= 0.2
gives a similar likelihood than the optimal model when
`range`

is constrained to be the value calculated from
`covar_exp()`

, `GLS.opt`

.

It is also possible to simultaneously estimate spatial parameters and
the nugget without using the time-series residuals. This is done by
finding the set of parameters describing spatial autocorrelation (e.g.,
`range`

and `nugget`

) that maximizes the
likelihood of a GLS given the data. This task is computationally slower
than optimizing `nugget`

alone with `fitGLS()`

and
therefore will take some time to run.

```
fitopt <- fitGLS_opt(formula = AR_coef ~ 0 + land, data = ndvi_AK3000,
coords = ndvi_AK3000[, c("lng", "lat")],
covar_FUN = "covar_exp",
start = c(range = .1, nugget = .2),
method = "BFGS", # use BFGS algorightm (see ?stats::optim())
control = list(reltol = 1e-5) # lower the convergence tolerance (see ?stats::optim())
)
fitopt$opt$par
# range nugget
# 0.02497874 0.17914929
fitopt$GLS$logLik
# [1] 12824.77
fitopt$GLS$MSE
# [1] 2.475972e-05
```

Note that, because `fitGLS_opt()`

does not require time
series residuals, it is possible to use `fitGLS_opt()`

for
statistical problems involving only spatial variables. In other words,
rather than \(\theta\) being a limited
to a time trend, it can be a purely spatial variable as well.

When time-series residuals are available, we recommend that you
estimate spatial parameters with `fitCor()`

and
`fitGLS()`

, rather than `fitGLS_opt()`

. In
simulation studies, using `fitCor()`

with
`fitGLS()`

often has better statistical performance than
using `fitGLS_opt()`

. See `?fitGLS_opt()`

for more
information about this function and its use.

The purpose of the tools provided by `remotePARTS`

is to
test map-level hypotheses about spatiotemporal data sets. In this
example, we will test 3 hypotheses using 3 different GLS models. Note
that these hypothesis are framed as regression-style problems; indeed,
`fitGLS()`

is essentially regression with spatially
autocorrelated random errors.

If we want to test the hypothesis that “there was a trend in Alaska NDVI Alaska from 1982-2013”, we can regress the AR coefficient on an intercept-only GLS model:

```
(GLS.int <- fitGLS(AR_coef ~ 1, data = ndvi_AK3000, V = V.opt, nugget = nug.opt, no.F = TRUE))
##
## Call:
## fitGLS(formula = AR_coef ~ 1, data = ndvi_AK3000, V = V.opt,
## nugget = nug.opt, no.F = TRUE)
##
## t-tests:
## Est t.stat pval.t
## (Intercept) 0.000972 0.4564 0.6481
##
## Model statistics:
## SSE MSE logLik
## 1 0.1503 5.011e-05 12806
```

We can see from the t-test that the intercept is not statistically different from zero. In other words, there is no map-level temporal trend in NDVI across the entire data set. We have not performed an F-test, because the full model is the intercept-only model and is therefore the same as the reduced model.

If we want to test the hypothesis that “trends in Alaskan NDVI differ
by land- cover class”, we can use `GLS.opt()`

from
earlier:

```
GLS.opt
##
## Call:
## fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V.opt,
## nugget = NA, no.F = FALSE)
##
## t-tests:
## Est t.stat pval.t
## landShrubland 0.0009201 0.4306 0.6668
## landSavanna 0.0003899 0.1822 0.8554
## landGrassland 0.0011467 0.5385 0.5903
##
## F-test:
## model df_F SSE MSE logLik Fstat pval_F
## 1 AR_coef ~ 0 + land 2 0.1499 5.003e-05 12809 3.263 0.03841
## 2 AR_coef ~ 1 2997 0.1503 5.014e-05 12805 NA NA
```

The t-tests show that the trend in NDVI, for all land-cover classes,
was not statistically different from zero, meaning that NDVI did not
show a statistically significant trend in any land-cover class. The
F-test (ANOVA table), however, shows that time trends in NDVI differ
among the land-cover classes. The better fit of the model with
land-cover classes can also be seen in the increase in the likelihood
(`logLik`

) compared to the intercept-only model.

Finally, to test the hypothesis that “temporal trends in NDVI differ with latitude”, we can regress the AR coefficient on latitude in our GLS model:

```
(GLS.lat <- fitGLS(AR_coef ~ 1 + lat, data = ndvi_AK3000, V = V.opt, nugget = nug.opt, no.F = FALSE))
##
## Call:
## fitGLS(formula = AR_coef ~ 1 + lat, data = ndvi_AK3000, V = V.opt,
## nugget = nug.opt, no.F = FALSE)
##
## t-tests:
## Est t.stat pval.t
## (Intercept) -8.617e-04 -0.04362 0.9652
## lat 2.986e-05 0.09337 0.9256
##
## F-test:
## model df_F SSE MSE logLik Fstat pval_F
## 1 AR_coef ~ 1 + lat 1 0.1503 5.012e-05 12806 0.008719 0.9256
## 2 AR_coef ~ 1 2998 0.1503 5.012e-05 12806 NA NA
```

The t-tests show that temporal trends in NDVI did not differ with latitude. Note that the p-value from the F-test is equivalent to that of the t-test p-value for effect of latitude.

We can see from these hypothesis tests that, at least among the 3000-pixel sub-sample of Alaska, the answer to all three questions that we posed is no: there is no statistical evidence for an overall greening in Alaska, nor differences among land-cover classes or latitude.

Until now, we have limited our analyses to the 3000-pixel subset of
Alaska, `ndvi_AK3000`

. Calls to `fitGLS()`

involve
inverting `V`

, and the computational complexity scales with
\(N^3\) where \(N\) is the number of pixels in the map. We
have used the data set `ndvi_AK3000`

up to now because the
computation time for the analyses is reasonable. However, 3000 pixels
means dealing with distance and covariance matrices that each contain
\(3,000 \times 3,000 = 9,000,000\)
elements. This is approaching the upper size limit for obtaining
relatively fast results.

In contrast, the covariance matrix for the full
`ndvi_AK10000`

data set would have \(10,000 \times 10,000 = 100,000,000\)
elements which creates a computationally infeasible problem for a normal
computer.

For these reasons, `fitGLS_paritition`

may be the most
useful function in the `remotePARTS`

package. This function
can perform the GLS analysis on the full `ndvi_AK10000`

data
set. In fact, `ndvi_AK10000`

is quite small in comparison to
many remote sensing data sets that could be analyzed with
`fitGLS_partition()`

.

`fitGLS_parition()`

conducts GLS analyses on partitions of
the data and then combines the results from the partitions to give
overall statistical results. Specifically, this process (1) breaks the
data into smaller and more manageable pieces (partitions), (2) conducts
GLS on each partition, (3) calculates cross-partition statistics from
pairs of partitions, and (4) summarizes the results with statistical
tests that account for correlations among partitions. We will use the
full `ndvi_AK10000`

data set to demonstrate
`fitGLS_parition()`

.

We have already performed the time-series analyses on the full data
set so you don’t have to. These are in the `AR_coef`

column
of `ndvi_AK10000`

. However, we used a complete data set, so
you will need to remove rare land-cover classes.

Step (1) is to divide pixels up into partitions, which is done with
the function `sample_parition()`

. Passing
`sample_partitions`

, the number of pixels in our map, and the
argument `partsize = 1500`

will result in a partition matrix
with 1,500 rows and 20 columns. Columns of the resulting partition
matrix `pm`

each contain a random sample of 1,500 pixels.
Each of these 20 samples (partitions) are non-overlapping, containing no
repeats. Setting `npart = NA`

will automatically give the
maximum number of partitions possible (i.e., 20).

Once we have our partition matrix, `fitGLS_parititon()`

performs steps (2)-(4) of the analyses. The input is similar to
`fitGLS`

. For this example, we specify (i) a formula, (ii)
the data as a data.frame (`df`

), (iii) the partition matrix
(partmat `pm`

), (iv) the covariance function
(`covar_FUN`

), (v) a list of spatial parameters
`covar.pars`

including our optimized range parameter, and a
`nugget`

. If `nugget`

is specified, this value is
used for the calculations, while if `nugget`

=
`NA`

(the default) it is estimated for each partition
separately.

Note that, although the compute time is much faster than if we needed
to invert the full covariance matrix `V`

, this example still
takes a long time to fit. Therefore, we have saved the output of this
code as an R object `partGLS.ndviAK`

so that you can look at
its output without having to execute the function.

The model was fit with this code:

```
partGLS_ndviAK <- fitGLS_partition(formula = AR_coef ~ 0 + land, data = df,
partmat = pm, covar_FUN = "covar_exp",
covar.pars = list(range = range.opt),
nugget = nug.opt, ncores = 4)
```

and can be loaded with

Here are the t-tests, that show that land cover class does not significantly affect NDVI trend:

```
partGLS_ndviAK$overall$t.test
## $p.t
## Est SE t.stat pval.t
## landShrubland 0.0013094359 0.001899589 0.6893260 0.4906359
## landSavanna 0.0004379666 0.001902064 0.2302586 0.8178960
## landGrassland 0.0016210956 0.001894683 0.8556027 0.3922404
##
## $covar_coef
## [,1] [,2] [,3]
## [1,] 3.623978e-06 3.569499e-06 3.561642e-06
## [2,] 3.566697e-06 3.633439e-06 3.537596e-06
## [3,] 3.562415e-06 3.540957e-06 3.605232e-06
```

It is **highly** recommended that users read the full
documentation (`?fitGLS_parition()`

) before using
`fitGLS_partition`

to analyze any data.

Here is the p-value for the chisqr test of the partitioned GLS

This again, indicates that the model which includes land cover
classes better explains variation in NDVI trends than the intercept-only
model. Note that the p-value is much lower than the p-value from the
F-test conducted by `fitGLS()`

. This is likely due to
outliers in the data, which should be removed before conducting any real
analysis. One simple way to filter potential trend outliers would be to
remove any pixels whose time-series coefficient standard errors are
unusually large (e.g., `SE > 4*mean(SE)`

).

Our parititoned GLS has returned the same conclusions as our standard GLS analysis. No map-level trend in NDVI was found within any of the land cover classes.

This was only one of our three hypotheses tested earlier. The
remaining two can easily be tested with `fitGLS_partition()`

,
as they were with `fitGLS`

, by changing the formula
argument.