\documentclass[10pt]{article}
\usepackage{graphicx}
\usepackage{Sweave}
\usepackage{bm}
\usepackage[bottom=0.5cm, right=1.5cm, left=1.5cm, top=1.5cm]{geometry}
% \VignetteIndexEntry{Guide to Function Objects in Spatstat}
% $Revision: 1.5 $ $Date: 2024/07/25 01:50:47 $
\newcommand{\pkg}[1]{\texttt{#1}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\link}[1]{#1}
\newcommand{\R}{{\sf R}}
\newcommand{\spst}{\pkg{spatstat}}
\newcommand{\Spst}{\pkg{Spatstat}}
\newcommand{\fv}{\texttt{"fv"}}
\newcommand{\env}{\texttt{"envelope"}}
\newcommand{\rat}{\texttt{"rat"}}
\newcommand{\obj}[1]{object of class {#1}}
\newcommand{\objs}[1]{objects of class {#1}}
\newcommand{\objsfvenv}{\objs\fv{} and \env}
\newcommand{\fun}[1]{\texttt{#1}}
\newcommand{\class}[1]{\texttt{"{#1}"}}
\newcommand{\Kfun}{$K$-function}
\newcommand{\Lfun}{$L$-function}
\newcommand{\pois}[1]{{#1}_{\mbox{\scriptsize pois}}}
\newcommand{\isoest}[1]{\widehat{#1}_{\mbox{\scriptsize iso}}}
\newcommand{\figref}[1]{Figure~\ref{#1}}
\newcommand{\secref}[1]{Section~\ref{#1}}
\newcommand{\eqref}[1]{(\ref{#1})}
\begin{document}
\bibliographystyle{plain}
<>=
library(spatstat)
x <- read.dcf(file = system.file("DESCRIPTION", package = "spatstat"),
fields = c("Version", "Date"))
sversion <- as.character(x[,"Version"])
sdate <- as.character(x[,"Date"])
options(useFancyQuotes=FALSE)
setmargins <- function(...) {
options(SweaveHooks=list(fig=function() par(mar=c(...)+0.1)))
}
@
<>=
options(SweaveHooks=list(fig=function() par(mar=c(5,4,2,4)+0.1)))
options(width=100)
@
\SweaveOpts{eps=TRUE}
\setkeys{Gin}{width=0.5\textwidth}
\title{A guide to function objects (class \fv\ and \env) in \spst}
\author{Adrian Baddeley, Rolf Turner and Ege Rubak}
\date{For \spst\ version \texttt{\Sexpr{sversion}}}
\maketitle
\thispagestyle{empty}
\begin{abstract}
This vignette explains how to use and manipulate
function objects (\objs{}\fv) and envelope objects (\objs{}\env)
in the \spst\ package.
\end{abstract}
\setcounter{tocdepth}{1}
\tableofcontents
\newpage
\section{Introduction}
\subsection{Functional summary statistics}
An \obj\fv\ (`function value table') is a
convenient way of storing several different
estimates of the same function.
It is common practice to summarise a spatial point pattern dataset
using a summary function, such as Ripley's \Kfun\ $K(r)$, rather than a
single numerical summary value.
Typically, an empirical estimate of the function, obtained from the data,
will be compared with the `theoretical' version of the function
that would be expected if the point pattern was completely random.
There may be several different empirical estimates of the function,
based on different estimation techniques, and we also want to compare
these estimates with one another.
The \spst{} family of packages makes it very easy to
compute and handle multiple versions of a summary function.
Taking the Finnish Pines data \texttt{finpines} as an example,
we can compute and plot estimates of Ripley's \Kfun\ by typing
<<>>=
K <- Kest(finpines)
@
<>=
plot(K)
@
The plot shows several curves, which
represent the different empirical estimates of the \Kfun\
(namely the
isotropic correction $\widehat K_{\mbox{\scriptsize iso}}(r)$,
translation correction $\widehat K_{\mbox{\scriptsize trans}}(r)$,
and border correction $\widehat K_{\mbox{\scriptsize bord}}(r)$)
and also the theoretical value $K_{\mbox{\scriptsize pois}}(r)$
that would be expected if the point pattern was completely random.
All these functions are plotted against the distance argument $r$.
The object \texttt{K}
belongs to class \fv{} (``function value table'').
It is a data frame
(that is, it also belongs to the class \class{data.frame})
with attributes giving extra information such as the
recommended way of plotting the function.
One column of the data frame contains evenly spaced values of
the distance argument $r$, while the other columns contain
estimates of the value of the function, or the theoretical value
of the function under CSR, corresponding to these distance values.
More information is given by the print method \texttt{print.fv},
which can be invoked just by typing the name of the object:
<<>>=
K
@
The output indicates that the columns in the
data frame are named \texttt{r}, \texttt{theo}, \texttt{border},
\texttt{trans}, and \texttt{iso}, and explains their contents.
For example,
the column \texttt{iso} contains estimates of the \Kfun{} using the
isotropic edge correction.
This column is labelled in the plot by the \R\ expression
\texttt{hat(K)[iso](r)} which is rendered
as the mathematical notation $\widehat K_{\mbox{\scriptsize iso}}(r)$.
The function argument in an \class{fv} object is usually,
but not always, called \texttt{r}. (Counterexamples include
\fun{transect.im} which returns an \fv\ object with
function argument \texttt{t}, and \fun{roc} which returns an \fv\ object
with function argument \texttt{p}.)
The command \texttt{plot(K)} is dispatched to the method
\texttt{plot.fv} to generate the graphic shown above. The plot method
uses the auxiliary information contained in \texttt{K} to
attach meaningful labels to the graphic.
Stripping off the auxiliary information we can inspect the
data frame itself:
<<>>=
head(as.data.frame(K))
@
This vignette explains how to plot, manipulate and create objects of
class \fv.
\subsection{Simulation envelopes}
Simulation envelopes of summary functions
are often used to assess statistical significance
in early stages of analysis. The \spst{} command \texttt{envelope}
generates simulation envelopes of a summary function:
<>=
E <- envelope(finpines, Kest, nsim=39)
@
<>=
plot(E)
@
In this example, the command \verb!E <- envelope(finpines, Kest, nsim=39)!
generates 39 simulated point patterns according to a completely random process,
computes the estimated \Kfun{} for each simulated pattern,
and finds the simulation envelopes
by identifying the pointwise minimum and maximum of the 39 simulated functions.
The result \texttt{E} is again an \obj\fv, but additionally belongs to the
class \env, and contains additional information about how the
envelopes were computed.
In the resulting plot, generated by the method \texttt{plot.envelope},
the region between the upper and lower
simulation envelopes is filled in grey shading. The solid black line
is the estimated \Kfun{} for the original \texttt{finpines} dataset,
and the dashed red line is the theoretical \Kfun{} for a completely
random pattern.
There is a lot of auxiliary information, displayed by \texttt{print.envelope}:
<<>>=
E
@
This vignette also explains how to plot, manipulate and create \objs\env.
Since envelope objects also belong to class \fv, the vignette first focuses
on the capabilities of class \fv.
\subsection{Why bother?}
\label{S:whybother}
Any self-respecting programmer would regard it as a
trivial task to organise data in a data frame
and plot each column of data as a curve in a graph.
Although the task is trivial, it can be time-consuming,
it is prone to error, and it can take many attempts to get it
exactly right. The authors of \spst\ developed the class \fv\
to make this job easier.
The class \fv{} is designed to
\begin{itemize}
\item
support \emph{multiple versions of a function},
such as the different estimates of the \Kfun{} obtained using
different edge corrections, the theoretical version of the \Kfun{}
for a completely random process, the upper and lower simulation envelopes
of the \Kfun, and so on.
\item
do the \emph{``book-keeping''} about the different versions
of the function, such as the names of the different columns.
\item
perform automatic \emph{plotting} of the function, handling all the details
of layout and labelling, including generating the mathematical labels
for each curve.
\item
support \emph{calculations} that will be applied automatically
to all the versions of the function.
\item
support \emph{conversion} to other data types in base \R,
such as data frames and functions.
\end{itemize}
For example, Besag's $L$ function is defined as
$L(r) = \sqrt{K(r)/\pi}$. Since we have already computed the
\Kfun{} in the example above, we can compute and plot the $L$-function
just by typing
<<>>=
L <- sqrt(K/pi)
@
<>=
plot(L)
@
Several kinds of magic have happened here:
\begin{itemize}
\item The expression \texttt{sqrt(K/pi)},
where \texttt{K} is an \obj\fv, has been evaluated
automatically by calculating $\sqrt{K(r)/\pi}$ for each of the versions of the
function stored in \texttt{K};
\item The internal data in the object \texttt{K}, which
provide mathematical labels for each version of the \Kfun,
have been modified according to the algebraic operation
that was just performed;
\item The result has been saved as a new \obj\fv{} named \texttt{L};
\item The \texttt{plot} method has correctly displayed each version
of the modified function using the modified mathematical labels, both
on the vertical axis and in the legend box;
\item The \texttt{plot} method has \textbf{automatically computed the position
of the legend box} to prevent it from overlapping the plotted curves;
\item The unit of length for the function argument has been
correctly saved in the object \texttt{L} and
correctly reported on the horizontal axis label.
\end{itemize}
The class \env{} extends the class \fv{} to handle additional information
about how the envelopes were computed. The code supporting the class \env{}
performs many of the ``trivial'' but error-prone calculations involving
envelopes. An \obj\env{} can also contain the simulated data (the point patterns
and/or the summary functions) that were used to compute the envelopes,
which makes it possible to re-use the simulated data to compute a different
version of the envelope.
\newpage
\section{Plotting}
\label{S:plot.fv}
\subsection{Default plot}
If \texttt{f} is an object of class \class{fv},
the command \texttt{plot(f)} is dispatched to the method \fun{plot.fv}.
The default behaviour of \texttt{plot(f)} is to generate a plot
containing several curves, each representing a different
version of the same target function,
plotted against the distance argument $r$.
<>=
plot(Gest(finpines))
@
<>=
aa <- plot(Gest(finpines))
@
Here \texttt{Gest} computes estimates of the nearest-neighbour
distance distribution function $G(r)$. The plot shows three
empirical estimates of $G(r)$ for the \texttt{finpines}
dataset, together with the `theoretical'
curve $\pois G(r)$ expected for a completely random pattern,
all plotted against the distance argument $r$.
The legend indicates the meaning of each curve.
The main title identifies the object in \R\ that was plotted.
The return value from \fun{plot.fv} is a data frame containing
more detailed information about the meaning of the curves.
For the plot generated above, the return value is
<>=
aa <- plot(Gest(finpines))
aa
@
<>=
aa
@
Here \texttt{lty} and \texttt{col} are the graphics parameters
controlling the line type and line colour, and \texttt{label} is
the mathematical notation for each edge-corrected estimate,
in the syntax recognised by \R{} graphics functions.
The plot generated by \texttt{plot.fv} uses the base \R\ graphics system
(not \texttt{lattice} or \texttt{ggplot}).
and is affected by graphics parameters
specified by \texttt{par()}.
\subsection{Modifying parameters of the default plot}
The default plot can easily be modified:
\begin{description}
\item[margin space:]
To change the amount of white space around the plot,
use \texttt{par('mar')}.
\item[main title:]
use \texttt{main=""} to suppress the main title.
\item[legend:]
Set \texttt{legend=FALSE} to suppress the legend.
Use the argument \texttt{legendargs} to modify the legend.
The legend position is
automatically computed to avoid overlap with the plotted curves,
but this can be overridden by \texttt{legendpos}.
\item[range of values:]
Use \texttt{xlim} and \texttt{ylim} to specify the ranges
of values on the $x$ and $y$ axes.
\textbf{See the note below about the ``recommended range''.}
Use \texttt{ylim.covers} to specify a numerical value or values that
must be covered by the $y$ axis. For example, \texttt{ylim.covers=0}
means that the $y$ axis will always include the origin.
\end{description}
For further information, see \texttt{help(plot.fv)}.
\subsection{Recommended range and recommended columns}
The default plot of an \fv\ object does not necessarily display
all the data that is contained in the object:
\begin{description}
\item[shorter range of distances:]
the range of values of the distance argument $r$
displayed in the default plot may be shorter than the
range of values actually contained in the data frame.
\item[not all columns of data:]
the plot may not display all the columns of data contained in the data frame.
\end{description}
This happens because an \obj\fv\ contains ``recommendations''
about the range of distances that should be displayed,
and about the columns of data that should be shown.
These recommendations are based on standard statistical practice.
The recommendations are followed when the default plot is generated,
unless they are specifically overridden.
Consider this example:
<<>>=
G <- Gest(finpines)
G
@
The printout shows the range of values of \texttt{r} that are present
in the table as the `\texttt{available range}'. It also gives a
`\texttt{recommended range}' which is generally shorter than the
available range. \emph{The default plot of the object will only show the
function values over the recommended range} and not over the full range of
values available. This is done so that the interesting detail
is clearly visible in the default plot. Values outside the recommended range
may be unreliable due to increased variance or bias, depending on the
edge correction. To prevent this behaviour and use the full range
of function values available, set \texttt{clip.xlim=FALSE} in the
plot command. Alternatively, specify the desired range of \texttt{r} values
using the argument \texttt{xlim} in the plot command.
The printout also says that the default plot formula is \verb! . ~ r !
where ``\verb!.!'' stands for \texttt{"km", "rs", "han", "theo"}.
This means that the default plot will display only the columns
named \texttt{"km", "rs", "han"} and \texttt{"theo"} and will \textbf{not}
display the columns named \texttt{"hazard"} and \texttt{"theohaz"} which
are mentioned in the printout. This is consistent with the graphic shown
above.
In this example, the column named \texttt{"hazard"} is an estimate
of the \emph{hazard rate} $h(r) = G'(r)/(1-G(r))$ of the nearest neighbour
distance function, rather than an estimate of $G(r)$ itself.
The column named \texttt{"theohaz"} is the corresponding
theoretical value of the hazard rate, expected
if the point pattern is completely random.
It makes sense that the hazard rate $h(r)$ and distribution function $G(r)$
should not normally be plotted together.
Therefore when \texttt{Gest} is executed, it designates
\texttt{"km", "rs", "han", "theo"} as the ``recommended columns''
that should be displayed by default, and it stores this information
in the resulting object \texttt{G}. When \texttt{plot(G)}
is executed, \texttt{plot.fv} uses this information to determine which
columns are to be plotted.
\subsection{Plot specified by a formula}
\label{S:plot.formula}
Different kinds of plots can be specified using a \texttt{formula}
as the second argument to \texttt{plot.fv}.
The left side of the formula represents what variables will
be plotted on the vertical ($y$) axis, and the right side determines the
variable on the horizontal ($x$) axis.
For example, in the object
\texttt{K <- Kest(finpines)}, the column named \texttt{iso}
contains the values of the isotropic correction estimate.
To plot the isotropic correction estimate against $r$, simply do
<>=
plot(K, iso ~ r)
@
In \fun{plot.fv}, both sides of the plot formula are interpreted
as mathematical expressions, so that operators like
`\verb!+!', `\verb!-!', `\verb!*!', `\verb!/!'
have their usual meaning in arithmetic.
The right-hand side of the formula can be any expression
that, when evaluated, yields a numeric vector, and the left-hand side is any
expression that evaluates to a vector or matrix of compatible
dimensions.
If the left-hand side of the formula, when evaluated, yields a matrix,
then each column of that
matrix is plotted against the specified $x$ variable as a separate curve.
In particular the left-hand side of the formula may invoke the
function \fun{cbind} to indicate that several different curves
should be plotted. For example, to plot only the
isotropic correction estimator and the theoretical curve:
<>=
plot(K, cbind(iso, theo) ~ r)
@
Notice that, in this example, \texttt{plot.fv} is clever enough to recognise
that \texttt{iso} and \texttt{theo} are both versions of the \Kfun\ $K(r)$,
and to decide that the appropriate label for the vertical axis is just $K(r)$.
The plot formula may also involve the names of constants like \texttt{pi},
standard functions like \texttt{sqrt}, and some special abbreviations listed
in Table~\ref{tab:fvnames}.
\begin{table}[!h]
\begin{tabular}{ll}
\verb!.x! & argument of function \\
\verb!.y! & best estimate of function \\
\verb!.! & all recommended estimates of function \\
\verb!.a! & all columns of function values \\
\verb!.s! & upper and lower limits of shading
\end{tabular}
\caption{
Recognised abbreviations for columns of an \class{fv} object.
}
\label{tab:fvnames}
\end{table}
The symbol \verb!.x! represents the function argument, usually \texttt{"r"}.
The symbol \verb!.y! represents one of the columns of function values
which has been designated as the `best' estimate, for use by some other
commands in \spst.
The symbol `\verb!.!' represents the `recommended' estimates.
The default plotting formula is \verb!. ~ .x! indicating that
each of the recommended estimates will be plotted against
the function argument. The formula \verb!.y ~ .x! means that the
best estimate of the function will be plotted against the function argument.
To expand these abbreviations for a particular \fv\ object,
use the function \texttt{fvnames}.
<<>>=
fvnames(K, ".y")
fvnames(K, ".")
@
A plot formula can be used to specify a transformation
that should be applied to the function values before they are displayed.
For example, to subtract the theoretical Poisson value
from each of the function estimates:
<>=
plot(K, . - theo ~ r)
@
Alternatively one could plot the function estimates \emph{against} the
Poisson value:
<>=
plot(K, . ~ theo)
@
This plot has some theoretical support.
In the discussion of Ripley's paper, Cox \cite{cox77discuss}
proposed that $\widehat K(r)$ should be plotted against $r^2$,
which is almost equivalent. We can follow Cox's recommendation
exactly:
<>=
plot(K, . ~ r^2)
@
The mathematical labels for the plot axes, and for the individual curves,
are constructed automatically by \spst\ from the plot formula.
If the plot formula involves the names of external variables,
these will be rendered in Greek where possible. For example, to
plot the average number of trees surrounding a typical tree in the
Swedish Pines data,
<>=
lambda <- intensity(swedishpines)
plot(K, lambda * . ~ r)
@
Here we use the name \texttt{lambda}
so that it will be rendered as the Greek letter $\lambda$ in
the graphics: the $y$-axis will be labelled $\lambda K(r)$.
\section{Calculating with an \fv\ object}
This section explains how to do calculations involving a single \obj\fv.
The next section covers calculations involving several \objs\fv.
\subsection{Arithmetic and mathematical operators}
Arithmetic and mathematical operations on an \obj\fv\
can be performed by simply writing the arithmetic expression
involving the name of the object. The following are valid:
<>=
K <- Kest(cells)
K/pi
sqrt(K/pi)
@
These inline calculations are performed by the operators
\texttt{Ops.fv} and \texttt{Math.fv}.
The operation is applied to each column of \emph{function values};
the function argument \texttt{r} will not be affected.
The result is another \obj\fv\ with the same number of columns,
with the same column names, but with appropriately adjusted
auxiliary information.
The expression can involve a command which returns an \obj\fv:
<>=
sqrt(Kest(cells)/pi)
@
The auxiliary information contained in the resulting object
will be slightly less elegant in this case.
These arithmetic and mathematical operations are applied only to the
\emph{recommended} columns of function values
identified by \texttt{fvnames(, ".")}.
\subsection{Other vectorised operations}
Functions such as \texttt{pmax} and \texttt{cumsum} apply to vector data,
but are not recognised as arithmetic or mathematical operators
by the \R\ parser,
so they are not covered by \texttt{Ops.fv} and \texttt{Math.fv}.
For expressions involving \texttt{pmax} and \texttt{cumsum} (or indeed
any algebraic expression whatsoever),
use the command \texttt{eval.fv} to perform the
calculation simultaneously for each column of function values:
<>=
Kpos <- eval.fv(pmax(0, K))
@
The result \texttt{Kpos} is another \obj\fv\
in which the function values are all non-negative.
The first argument of \texttt{eval.fv} should be an expression
involving the \textbf{name} of the \obj\fv.
By default, the calculation is only applied to the
\emph{recommended} columns of function values
identified by \texttt{fvnames(, ".")}.
This may be overridden by setting \texttt{dotonly=FALSE} in the call
to \texttt{eval.fv}.
The computations of \texttt{Ops.fv} and \texttt{Math.fv} are implemented
using \texttt{eval.fv} but there may be slight
differences in the handling of the
auxiliary information.
\subsection{Calculations involving specific columns}
\label{p:with.fv}
To manipulate or combine one or more columns of data in an \class{fv} object,
it is typically easiest to use \fun{with.fv},
a method for the generic \fun{with}.
This behaves in a very similar way to \texttt{with.data.frame}.
For example:
<<>>=
Kr <- Kest(redwood)
z <- with(Kr, iso - theo)
x <- with(Kr, r)
@
The results \texttt{x} and \texttt{z} are numeric
vectors, where \texttt{x} contains the values of the distance
argument $r$, and \texttt{z} contains the difference between
the columns \texttt{iso} (isotropic correction estimate) and
\texttt{theo} (theoretical value for CSR) for the \Kfun{}
estimate of the redwood seedlings data. For this to work, we
have to know that \texttt{Kr} contains columns named \texttt{r},
\texttt{iso} and \texttt{theo}. Printing
the object will reveal this information, as would typing
\texttt{names(Kr)} or \texttt{colnames(Kr)}.
The general syntax is \texttt{with(X, expr)} where \texttt{X}
is an \class{fv} object and \texttt{expr} can be any expression
involving the names of columns of \texttt{X}. The expression
can include functions, so long as they are capable of operating
on numeric vectors. The expression can also involve the abbreviations
listed in Table~\ref{tab:fvnames}:
<<>>=
Kcen <- with(Kr, . - theo)
@
subtracts the
`theoretical' value from all the available edge correction
estimates. The result \texttt{Kcen} is another \class{fv} object.
You can also get a result which is a vector or single number:
<<>>=
with(Kr, max(abs(iso-theo)))
@
\subsection{Extracting data}
An \obj\fv\ is essentially a data frame with additional attributes.
It contains the values of the desired function (such as $K(r)$)
at a finely spaced grid of values of the function argument $r$.
The data frame can be extracted (and the additional attributes
removed) using \texttt{as.data.frame.fv}:
<<>>=
df <- as.data.frame(K)
@
A single column of values can be extracted using the \verb!$!
operator in the usual way: \verb!K$iso! %$
would extract a vector containing the isotropic correction estimates of $K(r)$.
The subset extraction operator `\verb![!' has a method %]
for \class{fv} objects. This always returns another
\class{fv} object, so it will refuse to remove the column
containing values of the function argument \texttt{r}, for example.
To override this refusal,
convert the object to a data frame using \fun{as.data.frame}
and then use `\verb![!': % ]
the result will be a data frame or a vector.
Commands designed for data frames often work for \class{fv} objects as well.
The functions \texttt{head} and \texttt{tail} extract the top (first few rows)
and bottom (last few rows) of a data frame. They also work on \class{fv}
objects: the result is a new \class{fv} object containing the
function values for a short interval of $r$ values
at the beginning or end of the range. The function \texttt{subset}
selects designated subsets of a data frame using an elegant syntax
and this also works on \class{fv} objects. To restrict \texttt{K}
to the range $r \le 0.1$ and remove the border correction,
<<>>=
Ko <- subset(K, r < 0.1, select= -border)
@
\subsection{Converting to a true function}
An \obj\fv\ is meant to represent a function, but it contains only sample values
of the function at a grid of values of the function argument.
The table of function values can also be converted to a true function
in the \R{} language using \fun{as.function}. This makes it easy
to evaluate the function at any desired distance $r$.
<<>>=
Ks <- Kest(swedishpines)
kfun <- as.function(Ks)
kfun(9)
@
By default, the result \texttt{kfun} is a function in \R,
with a single argument
\texttt{r} (or whatever the original function argument was called).
The new function accepts numeric values or numeric vectors of distance values,
and returns the values of the `best' estimate of the function,
interpolated linearly between entries in the table.
If one of the other function estimates is required,
use the argument \texttt{value} to \fun{as.function} to select it.
<<>>=
kt <- as.function(Ks, value="trans")
kt(9)
@
To retain the option to select any one of the function estimates, type
<<>>=
kf <- as.function(Ks, value=".")
kf(9, "trans")
@
\subsection{Special operations}
\label{S:manip.fv}
An \class{fv} object can be manipulated using the operations
listed in Table~\ref{tab:fvmethods}.
\begin{table}[!h]
\begin{tabular}[c]{ll}
\texttt{f} & print a description \\
\texttt{print(f)} & print a description \\
\texttt{plot(f)} & plot the function estimates \\
\texttt{as.data.frame(f)} & strip extra information (returns a data frame) \\
\verb!f$iso! & extract column named \texttt{iso} (returns a numeric vector) \\
\verb!f[i,j]! & extract subset (returns an \class{fv} object) \\
\verb!subset(f, ...)! & extract subset (returns an \class{fv} object) \\
\texttt{with(f, expr)} & perform calculations with columns of data frame\\
\texttt{eval.fv(expr)} & perform calculations with several \class{fv} objects \\
\texttt{bind.fv(f, d)} & combine an \class{fv} object \texttt{f} and data frame \texttt{d} \\
\texttt{min(f)}, \texttt{max(f)}, \texttt{range(f)} &
range of function values \\
\texttt{Smooth(f)} & apply smoothing to function values \\
\texttt{deriv(f)} & derivative of function\\
\texttt{stieltjes(g,f)} & compute Stieltjes integral with respect to \texttt{f} \\
\texttt{as.function(f)} & convert to a function
\end{tabular}
\caption{Operations for manipulating an \class{fv} object \code{f}.}
\label{tab:fvmethods}
\end{table}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Calculating with several \fv\ objects}
\subsection{Arithmetic and mathematical operators}
Arithmetic and mathematical operations involving several \objs\fv\
can be performed by simply writing the arithmetic expression
involving the objects:
<>=
Kcel <- Kest(cells)
Kred <- Kest(redwood)
Kdif <- Kcel - Kred
@
These inline calculations are performed by the operators
\texttt{Ops.fv} and \texttt{Math.fv}.
The operation is applied to each column of \emph{function values};
the function argument \texttt{r} will not be affected.
The result is another \obj\fv\ with the same number of columns,
with the same column names, but with appropriately adjusted
auxiliary information.
The \fv\ objects should be `compatible'
in the sense that they have the same column names,
and the same vector of $r$ values. However, \texttt{eval.fv} will
attempt to reconcile incompatible objects.
(The \spst\ generic function \fun{compatible} determines whether two or more
objects are compatible, and the generic function \fun{harmonise}
makes them compatible, if possible.)
The expression can involve sub-expressions which return \objs\fv:
<>=
Kest(cells) - Kest(redwood)
@
The auxiliary information contained in the resulting object
will be slightly less elegant in this case.
\subsection{Other vectorised operations}
For expressions involving \texttt{pmax} and \texttt{cumsum} (or indeed
any algebraic expression whatsoever),
use the command \texttt{eval.fv} to perform the
calculation simultaneously for each column of function values.
<>=
Kcel <- Kest(cells)
Kred <- Kest(redwood)
Kmax <- eval.fv(pmax(Kcel, Kred))
@
The result \texttt{Kmax} is another \obj\fv.
The first argument of \texttt{eval.fv} should be an expression
involving the \textbf{names} of the \objs\fv.
By default, the calculation is only applied to the
\emph{recommended} columns of function values
identified by \texttt{fvnames(x, ".")} where \texttt{x} is the \obj\fv.
This may be overridden by setting \texttt{dotonly=FALSE} in the call
to \texttt{eval.fv}.
The expression is not permitted to contain sub-expressions
that evaluate to \objs\fv. However, you can use the argument \texttt{envir} to
supply such sub-expressions:
<>=
Kmax <- eval.fm(pmax(Kcel, Kred),
envir=list(Kcel=Kest(cells), Kred=Kest(redwood)))
@
The computations of \texttt{Ops.fv} and \texttt{Math.fv} are implemented
using \texttt{eval.fv} but there may be slight
differences in the handling of the auxiliary information.
\subsection{Combining objects}
Several \class{fv} objects can be combined using the operations
listed in Table~\ref{tab:fvmethods.multi}.
\begin{table}[!h]
\begin{tabular}[c]{ll}
\texttt{eval.fv(expr)} & perform calculations with several \class{fv} objects \\
\verb!cbind(f1, f2, ...)! & combine \class{fv} objects \texttt{f1, f2, ...} \\
\texttt{bind.fv(f, d)} & combine an \class{fv} object \texttt{f} and data frame \texttt{d} \\
\verb!collapse.fv(f1, f2, ...)! &
combine several redundant \class{fv} objects \\
\verb!compatible(f1, f2, ...)! &
check whether \class{fv} objects are compatible \\
\verb!harmonise(f1, f2, ...)! &
make \class{fv} objects compatible
\end{tabular}
\caption{Operations for manipulating
several \class{fv} objects \code{f1}, \code{f2}.}
\label{tab:fvmethods.multi}
\end{table}
Use \code{\link{cbind.fv}} to combine several \code{"fv"} objects.
Use \code{\link{bind.fv}} to glue additional columns onto an existing
\code{"fv"} object.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Creating \fv\ objects from raw data}
This section explains how to create \objs\fv\ from raw numerical data.
This would be useful if you are implementing a completely new kind of
summary function.
Subsection~\ref{S:creator} explains how to create an \obj\fv\
by providing the numerical data and the required auxiliary information.
Section~\ref{S:as.fv} describes an easier way to convert a data frame
(or similar object) to an \obj\fv\ without specifying the auxiliary
information, using default rules for the auxiliary information.
Section~\ref{S:compileK} describes special tools
\texttt{compileK, compilepcf, compileCDF} for creating an
\obj\fv\ from a numeric vector of distance values, using the rules that
apply to the \Kfun, or the pair correlation function,
or the nearest-neighbour distance distribution function.
\subsection{The creator function \fun{fv}}
\label{S:creator}
\subsubsection{The creator function}
The low-level function \code{fv} is used to create an object of
class \code{"fv"} from raw numerical data. It has the following syntax:
\begin{verbatim}
fv(x, argu = "r", ylab = NULL, valu, fmla = NULL,
alim = NULL, labl = names(x), desc = NULL,
unitname = NULL, fname = NULL, yexp = ylab)
\end{verbatim}
The arguments are as follows:
\begin{itemize}
\item
\code{x} contains the numerical data. It should be a data frame,
in which one column gives the values of the function argument for which
the function has been evaluated, and at least one other column
contains the corresponding values of the function.
These other columns typically give the values of different versions
or estimates of the same function,
for example, different estimates of the \Kfun{}
obtained using different edge corrections.
However they may also contain the values of related functions
such as the derivative or hazard rate.
\item
\code{argu} specifies the name of the column of
\code{x} that contains the values of the function argument
(typically \code{argu="r"} but this is not compulsory).
\item
\code{valu} specifies the name of another column
that contains the `recommended' estimate of the function.
It will be used to provide function values in those situations where
a single column of data is required. For example,
\code{envelope} computes its simulation envelopes
using the recommended value of the summary function.
\item
\code{fmla} specifies the default plotting behaviour.
It should be a formula, or a string that can be converted to a
formula. Variables in the formula are names of columns of \code{x}.
See \code{plot.fv} for the interpretation of this
formula.
\item
\code{alim} specifies the recommended range of the
function argument. This is used in situations where statistical
theory or statistical practice indicates that the computed
estimates of the function are not trustworthy outside a certain
range of values of the function argument. By default,
\code{plot.fv} will restrict the plot to this range.
\item
\code{fname} is a character string (or a vector of 2 character strings)
giving the name of the function itself.
For example, the \Kfun{} would have \code{fname="K"},
while the inhomogeneous \Kfun\ has \code{fname=c("K", "inhom")}.
\item
\code{ylab} is a mathematical expression
for the function value, used when labelling an axis
of the plot, or when printing a description of the
function. It should be an \R{} language object.
For example the \Kfun's mathematical name $K(r)$ is rendered
by \code{ylab=quote(K(r))}.
\item
\code{yexp} is another mathematical expression for the function value.
If \code{yexp} is present, then \code{ylab} will be
used only for printing, and \code{yexp} will be used for
annotating axes in a plot.
(Otherwise \code{yexp} defaults to \code{ylab}).
\item
\code{labl} is a character vector specifying plot labels
for each column of \code{x}. These labels will appear on the
plot axes (in non-default plots), legends and printed output.
Entries in \code{labl}
may contain the string \code{"\%s"} which will be replaced
by \code{fname} when plotted or printed.
For example the border-corrected estimate
of the \Kfun{} has label \code{"\%s[bord](r)"} which
becomes \code{"K[bord](r)"} when it is used in \texttt{plot.fv}
or \texttt{print.fv}.
\item
\code{desc} is a character vector containing intelligible
explanations of each column of \code{x}. Entries in
\code{desc} may contain the string \code{"\%s"} which will be replaced
by \code{ylab}. For example the border correction estimate of the
\Kfun{} has description \code{"border correction estimate of \%s"}.
This will be replaced by
\code{"border correction estimate of K(r)"} when it is used in
\texttt{print.fv}.
\item
\code{unitname} is the name of the unit of length
for the \underline{function argument}. Typically the function argument
\code{"r"} represents distance between points. The distance values are
typically expressed in terms of a distance unit, such as metres or feet.
This unit will be printed on the horizontal axis.
The argument \code{unitname} is an object of
class \class{unitname}, or \code{NULL} representing dimensionless values.
\end{itemize}
\subsubsection{Syntax for \texttt{ylab} and \texttt{yexp}}
Mathematical symbols and notation are supported
in \R\ base graphics. The labels on the axes of a graph,
in the body of the graph, and in graph legends, can all include
mathematical notation. The notation has to be encoded as an
\R\ language expression. The decoding is slightly idiosyncratic,
and this affects the programming of the class \fv.
The arguments \code{ylab} and \code{yexp} are mathematical
expressions for the function value: \texttt{ylab} is used when
printing a description of the function,
and \texttt{yexp} is used when labelling an axis.
Usually \texttt{ylab} and \texttt{yexp} are the same.
For example the \Kfun's mathematical name $K(r)$ is rendered
by \code{ylab=quote(K(r))} and \code{yexp=ylab}.
An example where they are different is the multitype \Kfun\ $K_{1,2}(r)$
where we set \code{ylab=quote(Kcross[1,2](r))}
and
\code{yexp=quote(Kcross[list(1,2)](r))}
to get the most satisfactory behaviour.
A useful programming tip is to use \code{substitute} instead of
\code{quote} to insert values of variables into an expression,
e.g. \code{substitute(Kcross[i,j](r), list(i=42,j=97))}
yields the same as \code{quote(Kcross[42, 97](r))}.)
\subsubsection{Syntax for \texttt{labl}}
The argument \texttt{labl} is a character vector
specifying plot labels
for each column of \code{x}. These labels will appear on the
plot axes (in non-default plots), legends and printed output.
Entries in \code{labl}
may contain the string \code{"\%s"} which will be replaced
by \code{fname} when plotted or printed.
For example the border-corrected estimate
of the \Kfun{} has label \code{"\%s[bord](r)"} which
becomes \code{"K[bord](r)"} when it is used in \texttt{plot.fv}
or \texttt{print.fv}. This mechanism allows the code to adjust the
labels when the object is changed --- for example, to produce the correct
labels in \code{plot(sqrt(K/pi))} as shown in Section~\ref{S:whybother}.
Things become more complicated if \texttt{fname} is a character vector
of length 2. In that case the appropriate expression for the border-corrected
estimate is \verb!"{hat(%s)[%s]^{bord}}(r)"! which becomes
\verb!{hat(K)[inhom]^{bord}}(r)! when it is used in \texttt{plot.fv}
or \texttt{print.fv}.
We strongly recommend using the function \fun{makefvlabel} to create the
appropriate labels. Its syntax is:
\begin{verbatim}
makefvlabel(op=NULL, accent=NULL, fname, sub=NULL, argname="r")
\end{verbatim}
where the arguments are character strings:
\begin{description}
\item[op] is a prefix or operator such as \code{"var"} (rarely used);
\item[accent] is an accent that should be applied to the main function symbol, usually \texttt{"hat"} for empirical estimates;
\item[fname] is the name of the function (usually a single letter or
a character vector of length 2);
\item[sub] is an optional subscript, typically used to discriminate
between different estimates of the function, such as different edge corrections;
\item[argname] is the name of the function argument.
\end{description}
Examples:
<<>>=
makefvlabel(NULL, NULL, "K", "pois")
makefvlabel(NULL, "hat", "K", "bord")
makefvlabel(NULL, "hat", c("K", "inhom"), "bord")
makefvlabel("var", "hat", c("K", "inhom"), "bord")
@
\subsubsection{Syntax for \texttt{desc}}
Each entry of \texttt{desc} is a single character string. It may
contain a \underline{single} instance of \code{"\%s"}, which will be replaced by
the function name when required.
\subsection{Conversion function \fun{as.fv}}
\label{S:as.fv}
The generic function \texttt{as.fv} converts other kinds of data to an \obj\fv.
The methods \fun{as.fv.matrix} and \fun{as.fv.data.frame} provide a lazy way
to convert a table of function data to an \obj\fv. The auxiliary information
is determined by applying default rules.
Other methods apply to classes of objects which intrinsically contain
an \obj\fv, and they simply extract the \fv\ object.
For example, a fitted model of class \class{kppm} contains the
summary function (either the $K$ function or the pair correlation function)
that was used to fit the model; so the method \fun{as.fun.kppm} simply extracts
this summary function.
\subsection{compileK, compilepcf, compileCDF}
\label{S:compileK}
A shortcut is provided for programmers wishing to implement
a summary function that is similar to
Ripley's $K$ function, the pair correlation function $g$,
the empty space function $F$ or
the nearest-neighbour distance distribution function $G$.
\subsubsection{$K$ functions and pair correlation functions}
Programmers who wish to implement a summary function similar to
Ripley's $K$ function or the pair correlation function
can use the commands \texttt{compileK} or \texttt{compilepcf}.
These low-level functions construct estimates of the $K$ function or
pair correlation function, or any similar functions, given only
the matrix of pairwise distances and optional weights associated
with these distances.
These functions are useful for code development and for teaching,
because they perform a common task, and do the housekeeping
required to make an object of class \fv\ that represents the
estimated function. However, they are not very efficient.
The basic syntax of \texttt{compileK} and \texttt{compilepcf} is:
<>=
compileK(D, r, weights = NULL, denom = 1, ...)
compilepcf(D, r, weights = NULL, denom = 1, ...)
@
where
\begin{itemize}
\item \texttt{D} is a square matrix giving the distances between all pairs of
points;
\item \texttt{r} is a vector of distance values, equally spaced, at which
the summary function should be calculated;
\item \texttt{weights} is an optional matrix of numerical weights
for the pairwise distances;
\item \texttt{denom} is the denominator for the estimator.
It may be a single number, or a numeric
vector with the same length as \texttt{r}.
\end{itemize}
The command \texttt{compileK} calculates the weighted estimate of the
$K$ function,
\[
K(r) = \frac{1}{v(r)} \sum_i \sum_{j \neq i} w_{i,j} \; 1\{ d_{i,j} \le r\}
\]
and \texttt{compilepcf} calculates the weighted estimate of the pair
correlation function,
\[
g(r) = \frac{1}{v(r)} \sum_i \sum_{j \neq i} w_{i,j}\; \kappa ( d_{i,j} - r)
\]
where $d_{i,j}$ is the distance between spatial points $i$ and $j$, with
corresponding weight $w_{i,j}$, and $v(r)$ is the specified denominator.
Here $\kappa$ is a fixed-bandwidth smoothing kernel.
For a point pattern in two dimensions, the usual denominator $v(r)$
is constant for the $K$ function, and proportional to $r$ for the pair
correlation function:
<<>>=
X <- japanesepines
D <- pairdist(X)
Wt <- edge.Ripley(X, D)
lambda <- intensity(X)
a <- (npoints(X)-1) * lambda
r <- seq(0, 0.25, by=0.01)
K <- compileK(D=D, r=r, weights=Wt, denom=a)
g <- compilepcf(D=D, r=r, weights=Wt, denom= a * 2 * pi * r)
@
The result of \texttt{compileK} or \texttt{compilepcf} can then
be edited (as explained in the next section)
to change the function name and other information as desired.
\subsubsection{Cumulative distribution functions}
Programmers wishing to implement a summary function
which is a cumulative distribution function, similar to
the functions \texttt{Gest} or \texttt{Fest},
can use the command \texttt{compileCDF}.
The basic syntax of \texttt{compileCDF} is:
<>=
compileCDF(D, B, r, ..., han.denom = NULL)
@
where
\begin{itemize}
\item \texttt{D} is a numeric vector of observed distances (such as the distance
from each data point to its nearest neighbour);
\item \texttt{B} is a numeric vector of censoring distances
(such as the distance from each data point to the boundary of the window);
\item \texttt{r} is a vector of distance values, equally spaced, at which
the summary function should be calculated;
\item \texttt{han.denom} is the denominator for the Hanisch estimator.
It is usually a numeric
vector with the same length as \texttt{r}.
\end{itemize}
An example for the nearest-neighbour distance distribution function $G(r)$:
<<>>=
X <- japanesepines
D <- nndist(X)
B <- bdist.points(X)
r <- seq(0, 1, by=0.01)
h <- eroded.areas(Window(X), r)
G <- compileCDF(D=D, B=B, r=r, han.denom=h)
## give it a better name
G <- rebadge.fv(G, new.fname="G", new.ylab=quote(G(r)))
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Editing the auxiliary information in \fv\ objects}
The ``auxiliary information'' in an \obj\fv\ consists of the
function name, a mathematical expression for the function,
mathematical expressions for each version of the function contained in a
column of data, the choice of which columns will be plotted by default,
and other information.
A programmer will often wish to create an \fv\ object first,
perhaps using some existing code, and then edit the auxiliary information.
The safe way to edit the auxiliary information
is to \textbf{use the internal functions in \spst} which support the \fv\ class:
\begin{itemize}
\item \texttt{rebadge.fv} is a low-level function which
allows the user to change any of the entries in the auxiliary information
as desired.
\item \texttt{rebadge.as.crossfun} and \texttt{rebadge.as.dotfun}
are wrappers for \texttt{rebadge.fv} which change the auxiliary information
into the form expected for a cross-type or dot-type summary function.
\item \texttt{fvlabels} extracts the mathematical code for each
version of the summary function, and \verb!fvlabels<-! changes the codes.
\item \texttt{makefvlabel} creates suitable mathematical code
for a version of the summary function.
\item The functions \texttt{fvnames} and \verb!fvnames<-! manage the
definition of the abbreviations listed in Table~\ref{tab:fvnames}.
\item The methods \texttt{formula.fv} and \verb!formula<-.fv! manage the
default plotting formula.
\item \verb!names<-.fv! changes the names of the columns in the
\fv\ object, and adjusts the internal data accordingly.
\item \texttt{tweak.fv.entry}
is a very low-level function that changes the auxiliary information
about one of the columns of data.
\item \texttt{prefixfv} is another wrapper for \texttt{rebadge.fv}
that adds a prefix to the name of the function.
\end{itemize}
\subsection{Low-level editing}
\texttt{rebadge.fv} is a low-level function which
allows the user to change any of the entries in the auxiliary information
as desired. It has syntax
<>=
rebadge.fv(x, new.ylab, new.fname,
tags, new.desc, new.labl,
new.yexp=new.ylab, new.dotnames,
new.preferred, new.formula, new.tags)
@
where \texttt{x} is the \obj\fv.
The arguments \texttt{new.fname}, \texttt{new.ylab} and
\texttt{new.yexp} (if present) specify new values for the
function name \texttt{fname} and the mathematical expressions
for the function, \texttt{ylab} and \texttt{yexp}, described in
Section~\ref{S:creator}.
The argument \texttt{new.dotnames} specifies a new value for
the selection of columns that are plotted by default.
This is a character vector of column names of \texttt{x}
and is associated with the abbreviation ``\verb!.!'' in
Table~\ref{tab:fvnames}.
The argument \texttt{new.preferred} specifies a new value for
the choice of column that is designated the ``preferred'' column
and is used in calculations which require a single column of data,
such as simulation envelopes.
This is a single character string which must be a column
name of \texttt{x}
and is associated with the abbreviation ``\verb!.y!'' in
Table~\ref{tab:fvnames}.
The argument \texttt{new.formula} specifies a new default plotting formula
for the summary function.
The argument \texttt{new.desc} specifies new values for the string
descriptions of the individual columns, replacing the argument \texttt{desc}
described in Section~\ref{S:creator}. It should be a character vector
with one entry for every column of \texttt{x} (or see below).
The argument \texttt{new.labl} specifies new values for the mathematical labels
of the individual columns, replacing the argument \texttt{desc}
described in Section~\ref{S:creator}. It should be a character vector
with one entry for every column of \texttt{x} (or see below).
The argument \texttt{tags} can be used to select some of the columns of data
so that only the auxiliary data for the selected columns will be changed.
It should be a character vector with entries which match the names
of columns of \texttt{x}. In that case, \texttt{new.desc} and \texttt{new.labl}
should have the same length as \texttt{tags},
and they will be taken as replacement values for the selected columns only.
The optional argument \texttt{new.tags} changes the names of the
columns of \texttt{x} (or the columns selected by \texttt{tags}) to new values.
\subsection{Changing information about one column}
The method \verb!names<-.fv! changes the names of the columns in the
\fv\ object, and adjusts the internal data accordingly.
The function \texttt{tweak.fv.entry}
is a very low-level function that changes the auxiliary information
about one of the columns of data. It has syntax
<>=
tweak.fv.entry(x, current.tag, new.labl=NULL, new.desc=NULL, new.tag=NULL)
@
where \texttt{current.tag} is the current name of the column for which
the information should be changed, \texttt{new.labl} is the new
mathematical label for the column, \texttt{new.desc} is the
new text description of the column, and \texttt{new.tag} is the new
name of the column. All these arguments are single strings
or \texttt{NULL}.
\subsection{Special idioms}
A few functions are available for performing special idioms.
The function \texttt{prefixfv} is a wrapper for \texttt{rebadge.fv}
that adds a prefix to the name of the function, and to all the
relevant auxiliary information. It has syntax
<>=
prefixfv(x, tagprefix="", descprefix="", lablprefix=tagprefix,
whichtags=fvnames(x, "*"))
@
where \texttt{tagprefix}, \texttt{descprefix} and \texttt{lablprefix}
are strings that should be added to the beginning
of the column name, the text description, and the mathematical expression
for each column of data. The argument \texttt{whichtags} specifies which
columns of data should be changed; the default is to change all columns.
The function \texttt{rebadge.as.crossfun} changes the auxiliary information
into the form expected for a bivariate, cross-type summary function,
analogous to the bivariate $K$-function $K_{i,j}(r)$ between two types of points
labelled $i$ and $j$ that is computed by the \spst\ function \texttt{Kcross}.
It has syntax
<>=
rebadge.as.crossfun(x, main, sub=NULL, i, j)
@
where \texttt{main} is the main part of the function name,
\texttt{sub} is the subscript part of the function name,
and \texttt{i} and \texttt{j} are the type labels.
For example
<>=
rebadge.as.crossfun(x, "L", i="A", j="B")
@
would create a function $L_{A,B}(r)$, and
<>=
rebadge.as.crossfun(x, "L", "inhom", "A", "B")
@
would create a function $L_{\mbox{\scriptsize inhom},A,B}(r)$.
Similarly the function \texttt{rebadge.as.dotfun}
changes the auxiliary information
into the form expected for a ``one type-to-any type'' summary function,
analogous to the function $K_{i \bullet}(r)$
that is computed by the \spst\ function \texttt{Kdot}.
It has syntax
<>=
rebadge.as.dotfun(x, main, sub=NULL, i)
@
\subsection{Handling mathematical labels}
The auxiliary information in an \fv\ object includes
mathematical labels for the different versions of the function,
which are displayed by \texttt{plot.fv}.
The function \texttt{fvlabels} extracts the mathematical code for each
version of the summary function from the \fv\ object,
and \verb!fvlabels<-! changes the codes.
The mathematical codes are strings
which must be recognisable to the \texttt{plotmath}
code in the \R\ base graphics system which is somewhat idiosyncratic.
The strings may also (and usually do)
include the substring \verb!%s! (appearing once or twice)
which will be replaced by the
function name. For example, if the function name is
\texttt{"K"} and the label is \verb!"hat(%s)[iso](r)"! this will be
parsed as \verb!hat(K)[iso](r)! which is rendered
as $\widehat K_{\mbox{\scriptsize iso}}(r)$.
The function \texttt{makefvlabel} creates suitable mathematical code
for a version of the summary function. Programmers are strongly advised
to use \texttt{makefvlabel}.
\subsection{Changing default behaviour}
The default behaviour for plotting an \fv\ object depends
on its default \texttt{plot formula} and typically on its
\texttt{dot names}.
The default plot formula is printed when the object is printed,
and can be extracted using \texttt{formula.fv}.
<<>>=
K <- Kest(cells)
formula(K)
@
The interpretation of the plot formula is explained
in Section~\ref{S:plot.formula}. In the example above, the left hand side
of the formula uses the abbreviation ``\verb!.!'' which stands
for ``the default list of columns to be plotted''. This abbreviation can
be expanded using \texttt{fvnames}:
<<>>=
fvnames(K, ".")
@
which indicates that the columns named
\texttt{"iso"},
\texttt{"trans"},
\texttt{"border"} and
\texttt{"theo"} will be plotted.
The choice of ``dot names'' can be changed using \verb!fvnames<-!:
<<>>=
fvnames(K, ".") <- c("iso", "theo")
@
In general the functions \texttt{fvnames} and \verb!fvnames<-!
manage the
definition of all the abbreviations listed in Table~\ref{tab:fvnames}.
\section{Pooling several function estimates}
\subsection{Pooling}
``Pooling'' or combining several datasets into a single dataset
is a common statistical procedure. If we are only interested in a summary
statistic of the data, then in some special circumstances, the
summary statistic of the pooled dataset can be calculated from the
summary statistics of the original, separate datasets. For example,
if we have a set of $n_1$ observations with sample mean $m_1$,
and another set of $n_2$ observations with sample mean $m_2$,
then the sample mean of the pooled set of $n_1+n_2$ observations has
sample mean $(n_1 m_1 + n_2 m_2)/(n_1+n_2)$, a weighted average of the
sample means of the original datasets. This procedure is loosely called
``pooling'' the sample mean.
If we have two point pattern datasets, observed in different windows,
we can ``pool'' the patterns by simply
treating them as a single point pattern observed in the combined window.
If we pool two point pattern datasets, the estimated $K$-function of the
pooled pattern can be calculated from the estimated $K$-functions
$K_1(r)$ and $K_2(r)$ of the original point patterns, if we know the number of
points in each of the two original patterns.
That is, Ripley's $K$-function can be ``pooled''.
The summary functions used in spatial statistics can be pooled,
provided they are able to be expressed as a ratio
$f(r) = A(r)/B$ or $f(r) = A(r)/B(r)$ where $A(r)$ is the ``numerator''
and $B$ or $B(r)$ is the ``denominator''. The pooled estimate is the
ratio of the sum of numerators divided by the sum of denominators.
For details, see section 16.8.1 of \cite{baddrubaturn15}.
\subsection{Pooling summary functions}
The generic function \texttt{pool} performs pooling of summary statistics
(including summary functions like the $K$-function).
In order for this to work correctly, we must know the numerator and
denominator for each of the individual summary statistics or summary functions.
For this purpose there is a special class \class{rat} (for ``ratio object'').
An \obj\rat\ contains two attributes named
\texttt{"numerator"} and \texttt{"denominator"} which contain the
numerator and denominator of the ratio.
For many of the summary functions provided in \spst,
if we set the argument \texttt{ratio=TRUE}, the numerator and
denominator will be calculated separately and saved in the resulting
object, which will belong to the class \class{rat} (``ratio object'')
as well as \fv.
<<>>=
class(Kest(cells))
class(Kest(cells, ratio=TRUE))
@
This capability is currently available for the functions
\texttt{compileK},
\texttt{compilepcf},
\texttt{Finhom},
\texttt{Gcross.inhom},
\texttt{Gdot.inhom},
\texttt{Ginhom},
\texttt{GmultiInhom},
\texttt{Jcross.inhom},
\texttt{Jdot.inhom},
\texttt{Jinhom},
\texttt{Jmulti.inhom},
\texttt{K3est},
\texttt{Kcross},
\texttt{Kdot},
\texttt{Kest},
\texttt{Kinhom},
\texttt{Kmulti},
\texttt{Ksector},
\texttt{linearKinhom},
\texttt{linearK},
\texttt{linearpcfinhom},
\texttt{linearpcf},
\texttt{nnorient},
\texttt{pairorient},
\texttt{pcfcross},
\texttt{pcfdot},
\texttt{pcfmulti},
\texttt{pcf.ppp} and
\texttt{Tstat}.
The method \texttt{pool.rat} will pool several objects which all belong
to the classes \class{fv} and \class{rat}:
<<>>=
X1 <- runifpoint(50)
X2 <- runifpoint(50)
K1 <- Kest(X1, ratio=TRUE)
K2 <- Kest(X2, ratio=TRUE)
K <- pool(K1, K2)
@
<<>>=
Xlist <- runifpoint(50, nsim=6)
Klist <- lapply(Xlist, Kest, ratio=TRUE)
K <- do.call(pool, Klist)
@
There is also a fallback method \texttt{pool.fv}
which is used when some of the objects do not contain ratio information.
This method effectively pretends that all the objects have the same
denominator.
\subsection{Low level utilities}
Programmers wishing to implement a summary function with
ratio information can use the following low-level utilities:
\begin{itemize}
\item \texttt{ratfv} is the creator function, analogous to \texttt{fv}.
Its syntax is
<>=
ratfv(df, numer, denom, ..., ratio=TRUE)
@
where \texttt{df} is a data frame, \texttt{numer} and \texttt{denom}
are \objs\fv, and additional arguments \verb!...! are passed to \texttt{fv}.
It is sufficient to specify either \texttt{df} or \texttt{numer},
in addition to \texttt{denom}.
\item \texttt{bind.ratfv} glues extra columns onto an existing
\obj\fv and \class{rat}.
\item \texttt{conform.ratfv} forces the auxiliary information
in the numerator and denominator of an \obj\fv and \class{rat}
to agree with the auxiliary information of the main object.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Structure of \objs\fv}
This section explains the information contained in \objs\fv.
\subsection{Advice}
We strongly discourage the user from unpacking the internal
contents of \objs\fv{} and manipulating the contents directly.
Instead, we recommend using the functions that are available in \spst{}
for handling these objects.
Although it is easy to extract the internal data contained in an object
in \R, the structure of \objs\fv\ is idiosyncratic, and the internal format
is variable. Looking at one example of an \obj\fv\ will not tell you how
it all works. This is because there are many cases to handle,
and many quirks in the formatting of algebraic expressions in \R.
Using the functions provided in \spst\ is also more efficient
than extracting data yourself, because it avoids creating multiple
copies of the data.
Most of all, \textbf{do not change the internal contents of \objs\fv}.
This can easily violate the internal format and cause errors.
Use the functions supplied for handling these objects.
\subsection{Objects of class \fv}
Objects of class \fv\ are returned by many commands in the
\spst\ packages. Usually these objects are obtained by
analysing a spatial point pattern dataset.
There are also functions to create such objects from raw data.
An \obj\fv{} is essentially a data frame with
additional attributes containing auxiliary information.
\subsubsection*{Data frame structure}
The first column of the data frame contains values of the
function argument. These values are arranged in increasing order,
are usually evenly-spaced, and usually start from zero. The first
column usually (but not always) has the column name \texttt{r}.
Subsequent columns of the data frame contain the values of
different versions of the summary function, corresponding to the
values of the function argument. These columns may have any column names.
These versions of the function may be referred to by their column names
when plotting and manipulating the object.
<<>>=
G <- Gest(finpines)
df <- as.data.frame(G)
head(df)
@
In this example, the object \texttt{G} contains estimates of
the nearest-neighbour distance distribution function $G(r)$
for the \texttt{finpines} dataset. For the distance
value $r = $ \Sexpr{round(df[6, "r"], 9)} metres, the estimate of $G(r)$
using the \texttt{han} method is \Sexpr{round(df[6, "han"],8)}.
Columns of data can be extracted using the data frame structure.
To extract the sequence of \texttt{r} values,
use \verb!df$r! or \verb!G$r! or \verb!df[, "r"]!.
To extract the corresponding values of
\texttt{han}, use \verb!df$han! or \verb!G$han! or \verb!df[, "han"]!.
\subsubsection*{Auxiliary information}
In the example above,
to find out what the column \texttt{han} means, we need the
auxiliary information stored in the object \texttt{G}.
This can be printed out directly in readable form:
<<>>=
G
@
Thus, \texttt{han} refers to the estimate of $G(r)$ using Hanisch's method.
The auxiliary information is stored in attributes
of the object. The full list of attributes is as follows:
\begin{tabular}{lll}
\texttt{argu} & character(1) & Name of function argument (usually \texttt{"r"}) \\
\texttt{valu} & character(1) & Name of preferred function value \\
\texttt{ylab} & language & Mathematical expression for function (for vertical axis of plot) \\
\texttt{yexp} & language & Mathematical expression for function (in algebra) \\
\texttt{fmla} & character(1) & Default plotting formula \\
\texttt{alim} & numeric(2) & Recommended range of function argument \\
\texttt{labl} & character($m$) & Mathematical labels for each column\\
\texttt{desc} & character($m$) & Text descriptions of each column\\
\texttt{units} & unitname & Unit of length (for function argument) \\
\texttt{fname} & character(1 or 2) & Symbol for function only \\
\texttt{dotnames} & character($k \le m$) & Column names of all recommended versions \\
\texttt{shade} & character(0 or 2) & Column names of limits of grey shading\\
\end{tabular}
\code{argu} is the name of the column of the data frame
that contains the values of the function argument
(typically \code{argu="r"} but this is not compulsory).
\code{valu} specifies the name of another column
that contains the `recommended' estimate of the function.
It will be used to provide function values in those situations where
a single column of data is required. For example,
\code{envelope} computes its simulation envelopes
using the recommended value of the summary function.
\code{fmla} specifies the default plotting behaviour,
as explained in Section~\ref{S:plot.fv}. It is a character string
that can be converted to a \texttt{formula} in the \R\ language.
\code{alim} specifies the recommended range of the
function argument. It is a numeric vector of length 2.
This is used in situations where statistical
theory or statistical practice indicates that the computed
estimates of the function are not trustworthy outside a certain
range of values of the function argument. By default,
\code{plot.fv} will restrict the plot to this range.
\code{fname} gives the name of the function itself.
For example, the \Kfun{} would have \code{fname="K"}.
It is either a character string, or a vector of two character strings,
where the second element is interpreted as a subscript.
For example, the inhomogeneous \Kfun{} computed by \code{Kinhom}
has \code{fname=c("K", "inhom")}.
\code{ylab} is a mathematical expression
for the function value, used when printing a description of the
function. It is an \R{} language object.
For example the \Kfun's mathematical name $K(r)$ is rendered
by \code{ylab=quote(K(r))}.
\code{yexp} is another mathematical expression
for the function value, used for annotating axes in a plot.
\code{labl} is a character vector specifying plot labels
for each column of the data frame. These labels will appear on the
plot axes (in non-default plots), legends and printed output.
Entries in \code{labl}
may contain the string \code{"\%s"} which will be replaced
by \code{fname}.
\code{desc} is a character vector containing intelligible
explanations of each column of the data frame. Entries in
\code{desc} may contain the string \code{"\%s"} which will be replaced
by \code{ylab}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Structure of \objs\env}
This section explains the information contained in \objs\env.
\subsection{The \texttt{envelope} command}
The \spst\ function \fun{envelope} performs the
calculations required for envelopes. It computes the
summary function for a point pattern dataset, generates
simulated point patterns, computes the summary functions for the
simulated patterns, and computes the envelopes of these summary
functions.
<>=
E <- envelope(swp, Kest, nsim=39, fix.n=TRUE)
@
The result is an object of class \class{envelope} and \class{fv}
which can be printed and plotted and manipulated
using the tools for \class{fv} objects, and by additional tools
provided for \class{envelope} objects.
The print method gives a lot of detail:
<<>>=
E
@
\subsection{Re-using envelope data}
The method \texttt{envelope.envelope} allows new \fun{envelope} commands to
be applied to a previously computed \class{envelope} object,
provided it contains the necessary data.
In the original call to \fun{envelope}, if the argument
\texttt{savepatterns=TRUE} was given, the resulting \class{envelope}
object contains all the simulated point patterns. Alternatively
if the argument \texttt{savefuns=TRUE} was given, the resulting
object contains the individual summary functions for each of the
simulated patterns. This information is not saved, by default,
for efficiency's sake.
Envelopes created with \texttt{savepatterns=TRUE} allow any kind of
new envelopes to be computed using the same simulated point patterns:
<>=
E1 <- envelope(redwood, Kest, savepatterns=TRUE)
E2 <- envelope(E1, Gest, global=TRUE,
transform=expression(fisher(.)))
@
Envelopes created with \texttt{savefuns=TRUE} allow
the user to switch between pointwise and global envelopes
of the same summary function, to apply different transformations
of the summary function, and to change some parameters:
<>=
A1 <- envelope(redwood, Kest, nsim=39, savefuns=TRUE)
A2 <- envelope(A1, global=TRUE, nsim=19,
transform=expression(sqrt(./pi)))
@
\subsection{Pooling several envelopes}
It is also possible to combine the simulation data
from several envelope objects
and to compute envelopes based on the combined data.
This is done using \fun{pool.envelope}, a method for
the \spst\ generic \fun{pool}.
The envelopes must be compatible,
in that they are envelopes for the same function,
and were computed using the same options. The
individual summary functions must have been saved.
<>=
E1 <- envelope(cells, Kest, nsim=10, savefuns=TRUE)
E2 <- envelope(cells, Kest, nsim=20, savefuns=TRUE)
E <- pool(E1, E2)
@
\subsection{Structure of envelope objects}
An \obj\env{} is an \obj\fv{} with additional auxiliary information:
\begin{itemize}
\item
the names of two of the columns of function values,
designated as the upper and lower simulation envelopes of the function,
saved in \texttt{attr(, "shade")} and retrievable as
\texttt{fvnames(, .s)}
\item
details of how the envelopes were computed,
saved in \texttt{attr(, "einfo")}
\item
optionally, the simulated point patterns used to compute the envelopes,
saved in \texttt{attr(, "simpatterns")}
\item
optionally, the simulated summary functions
(the summary functions computed for the simulated point patterns)
used to compute the envelopes,
saved in \texttt{attr(, "simfuns")}
\end{itemize}
Objects of class \env\ inherit the class \fv, so they can be manipulated
using methods for class \fv, but there are extra methods for the
special class \env.
\subsection{The \texttt{einfo} list}
Additional attribute \texttt{einfo} is a list of:
\begin{tabular}{lll}
\texttt{call} & character(1) & original function call \\
\texttt{Yname} & character(1) & name of original dataset \\
\texttt{valname} & character(1) & column name of function values used\\
\texttt{csr} & logical(1) & \texttt{TRUE} if simulations based on CSR \\
\texttt{csr.theo} & logical (1) & see below\\
\texttt{use.theory} & logical (1) & see below\\
\texttt{pois} & logical(1) & \texttt{TRUE} if simulations
are Poisson process\\
\texttt{simtype} & character(1) & Type of simulation (see below) \\
\texttt{constraints} & character(1) & Additional information (see below) \\
\texttt{nrank} & integer(1) & Rank of envelopes \\
\texttt{nsim} & integer(1) & Number of simulations for envelope \\
\texttt{Nsim} & integer(1) & Total number of simulations\\
\texttt{global} & logical(1) & \texttt{TRUE} if global envelopes\\
\texttt{ginterval} & numeric(0 or 2) & Domain of function argument for global envelopes \\
\texttt{dual} & logical(1) & \texttt{TRUE} if two sets of simulations performed\\
\texttt{nsim2} & integer(1) & Number of simulations in second set \\
\texttt{VARIANCE} & logical(1) & \texttt{TRUE} if limits are based on standard deviation \\
\texttt{nSD} & numeric(1) & Number of standard deviations defining limits \\
\texttt{alternative} & character(1) & \texttt{two.sided}, \texttt{less} or \texttt{greater} \\
\texttt{scale} & \texttt{NULL} or function & Scaling function for function argument \\
\texttt{clamp} & logical(1) & \texttt{TRUE} if one-sided deviations must be positive \\
\texttt{use.weights} & logical(1) & \texttt{TRUE} if sample mean is weighted\\
\texttt{do.pwrong} & logical(1) & \texttt{TRUE} if ``wrong $p$-value'' should be calculated \\
\texttt{gaveup} & logical(1) & \texttt{TRUE} if simulations terminated early
\end{tabular}
\begin{thebibliography}{1}
\bibitem{baddrubaturn15}
A. Baddeley, E. Rubak, and R. Turner.
\newblock {\em Spatial Point Patterns: Methodology and Applications with {{R}}}.
\newblock Chapman \& Hall/CRC Press, 2015.
\bibitem{besa77d}
J.E. Besag.
\newblock Contribution to the discussion of the paper by Ripley (1977).
\newblock \emph{Journal of the Royal Statistical Society, Series B}
\textbf{39} (1977) 193--195.
\bibitem{cox77discuss}
D.R. Cox.
\newblock Contribution to the discussion of the paper by Ripley (1977).
\newblock \emph{Journal of the Royal Statistical Society, Series B}
\textbf{39} (1977) 206.
\end{thebibliography}
\end{document}