% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/get.R
\name{getIndex}
\alias{getIndex}
\title{Get an Index of Available Argo Float Profiles}
\usage{
getIndex(
  filename = "core",
  server = argoDefaultServer(),
  destdir = argoDefaultDestdir(),
  age = argoDefaultIndexAge(),
  quiet = FALSE,
  keep = FALSE,
  debug = 0L
)
}
\arguments{
\item{filename}{character value that indicates the file name to be downloaded
from a remote server, or (if \code{server} is set to NULL) the name of a local
file.  For the remote case, the value of \code{server} must be taken from
the first column of the table given in \dQuote{Details}, or (for some file types)
as in the nickname given in the middle column. Note that the downloaded
file name will be based on the full file name given as this argument, and
that nicknames are expanded to the full filenames before saving.  Note that
the downloaded file is in gzipped format (indicated by a file name ending
in \code{.gz}) and it is examined and processed by \code{\link[=getIndex]{getIndex()}} to produce an
R archive file (ending in \code{.rda}) that is stored locally. The \code{.gz} file
is discarded by default, unless \code{keep} is set to TRUE.  (See also
the documentation on the \code{server} parameter, next, and the subsection
entitled \dQuote{Using a previously-downloaded index}.)}

\item{server}{an indication of the source for \code{filename}.  There are 2
possibilities for this.  (1) If \code{server} is \code{NULL}, then \code{filename} is taken
to be the name of a local index file (ending in suffix \code{.gz}) that
was previously downloaded from a server.  The easiest way to get
such a file is with a previous call to \code{\link[=getIndex]{getIndex()}} with \code{keep} set
to TRUE.  (2) If \code{server} is a character vector (as is it is by default),
it is taken to represent remote servers to be tried as sources
for an index file.  The use of multiple servers is a way to avoid
errors that can result if a server refuses a download request.
As of March 2023, the three servers known to work are
\code{"https://data-argo.ifremer.fr"}, \code{"ftp://ftp.ifremer.fr/ifremer/argo"} and
\code{"https://usgodae.org/pub/outgoing/argo"}.
These may be referred
to with nicknames \code{"ifremer-https"}, \code{"ifremer"}and  \code{"usgodae"}.
Any URL that can be used in \code{\link[curl:curl_download]{curl::curl_download()}} is a valid value provided
that the file structure is identical to the mirrors listed above. See
\code{\link[=argoDefaultServer]{argoDefaultServer()}} for how to provide a default value for \code{server}.}

\item{destdir}{character value indicating the directory in which to store
downloaded files. The default value is to compute this using
\code{\link[=argoDefaultDestdir]{argoDefaultDestdir()}}, which returns \verb{~/data/argo} by default,
although it also provides ways to set other values using
\code{\link[=options]{options()}}.
Set \code{destdir=NULL}
if \code{destfile} is a filename with full path information.
File clutter is reduced by creating a top-level directory called
\code{data}, with subdirectories for various file types; see
\dQuote{Examples}.}

\item{age}{numeric value indicating how old a downloaded file
must be (in days), for it to be considered out-of-date.  The
default, \code{\link[=argoDefaultIndexAge]{argoDefaultIndexAge()}}, limits downloads to once per day, as a way
to avoid slowing down a workflow with a download that might take
a minute or so.  Setting \code{age=0} will force a new
download, regardless of the age of the local file, and that
age is changed to 0 if \code{keep} is \code{TRUE}.  The value of \code{age}
is ignored if \code{server} is NULL (see \dQuote{Using a previously
downloaded Index} in \dQuote{Details}).}

\item{quiet}{logical value indicating whether to silence some
progress indicators.  The default is to show such indicators.}

\item{keep}{logical value indicating whether to retain the
raw index file as downloaded from the server.  This is \code{FALSE}
by default, indicating that the raw index file is to be
discarded once it has been analyzed and used to create a cached
file (which is an RDA file).  Note that if \code{keep}
is \code{TRUE}, then the supplied value of \code{age} is converted
to 0, to force a new download.}

\item{debug}{an integer indicating the level of debugging.  If this
is 0, then the function works somewhat quietly.  If it is 1, messages
are printed at various steps in the process. If it is any number higher
than 1, then those messages will be prefixed by an indication of
the time, down to the millisecond.}
}
\value{
An object of class \code{\linkS4class{argoFloats}} with type=\code{"index"}, which
is suitable as the first argument of \code{\link[=getProfiles]{getProfiles()}}.
}
\description{
This function gets an index of available Argo float profiles, typically
for later use as the first argument to \code{\link[=getProfiles]{getProfiles()}}. The source for the
index may be (a) a remote data repository, (b) a local repository (see the
\code{keep} argument), or (c) a cached RDA file that contains the result
of a previous call to \code{\link[=getIndex]{getIndex()}} (see the \code{age} parameter).
}
\details{
\strong{Using an index from a remote server}

The first step is to construct a URL for downloading, based on the
\code{url} and \code{file} arguments. That URL will be a string ending in \code{.gz},
or \code{.txt} and from this the name of a local file is constructed
by changing the suffix to \code{.rda} and prepending the file directory
specified by \code{destdir}.  If an \code{.rda} file of that name already exists,
and is less than \code{age} days old, then no downloading takes place. This
caching procedure is a way to save time, because the download can take
from a minute to an hour, depending on the bandwidth of the connection
to the
server.

The resultant \code{.rda} file, which is named in the return value of this
function, holds a list named \code{index} that holds following elements:
\itemize{
\item \code{ftpRoot}, the FTP root stored in the header of the source \code{file}
(see next paragraph).
\item \code{server}, the URL at which the index was found, and from
which \code{\link[=getProfiles]{getProfiles()}} can construct URLs from which to
download the NetCDF files for individual float profiles.
\item \code{filename}, the argument provided here.
\item \code{header}, the preliminary lines in the source file that start
with the \verb{#} character.
\item \code{data}, a data frame containing the items in the source file.
The names of these items are determined automatically from
\code{"core"},\code{"bgcargo"}, \code{"synthetic"} files.
}

Some expertise is required in deciding on the value for the
\code{file} argument to \code{\link[=getIndex]{getIndex()}}.  As of March 2023, the
sites
\verb{https://usgodae.org/pub/outgoing/argo}
and
\verb{ftp://ftp.ifremer.fr/ifremer/argo}
contain multiple index files, as listed in the left-hand column of the
following table. The middle column lists nicknames
for some of the files.  These can be provided as the \code{file} argument,
as alternatives to the full names.
The right-hand column describes the file contents.
Note that the servers also provide files with names similar to those
given in the table, but ending in \code{.txt}.  These are uncompressed
equivalents of the \code{.gz} files that offer no advantage and take
longer to download, so \code{\link[=getIndex]{getIndex()}} is not designed to work with them.
\tabular{lll}{
\emph{File Name}                           \tab \emph{Nickname}              \tab \emph{Contents}\cr
\code{ar_greylist.txt}                     \tab -                       \tab Suspicious/malfunctioning floats\cr
\code{ar_index_global_meta.txt.gz}         \tab -                       \tab Metadata files\cr
\code{ar_index_global_prof.txt.gz}         \tab \code{"argo"} or \code{"core"}    \tab Argo data\cr
\code{ar_index_global_tech.txt.gz}         \tab -                       \tab Technical files\cr
\code{ar_index_global_traj.txt.gz}         \tab \code{"traj"}                \tab Trajectory files\cr
\code{argo_bio-profile_index.txt.gz}       \tab \code{"bgc"} or \code{"bgcargo"}  \tab Biogeochemical Argo data (without S or T)\cr
\code{argo_bio-traj_index.txt.gz}          \tab \code{"bio-traj"}            \tab Bio-trajectory files\cr
\code{argo_synthetic-profile_index.txt.gz} \tab \code{"synthetic"}           \tab Synthetic data, successor to \code{"merge"}\cr
}

\strong{Using a previously downloaded index}

In some situations, it can be desirable to work with local
index file that has been copied directly from a remote server.
This can be useful if there is a desire to work with the files
in R separately from the \code{argoFloats} package, or with python, etc.
It can also be useful for group work, in which it is important for
all participants to use the same source file.

This need can be handled with \code{\link[=getIndex]{getIndex()}}, by specifying \code{filename}
as the full path name to the previously downloaded file, and
at the same time specifying \code{server} as NULL. This works for
both the raw files as downloaded from the server (which end
in \code{.gz}, and for the R-data-archive files produced by \code{\link[=getIndex]{getIndex()}},
which end in \code{.rda}. Since the \code{.rda} files load an order
of magnitude faster than the \code{.gz} files, this is usually
the preferred approach.  However, if the \code{.gz} files are preferred,
perhaps because part of a software chain uses python code that
works with such files, then it should be noted that calling
\code{getIndex()} with \code{keep=TRUE} will save the \code{.gz} file in
the \code{destdir} directory.
}
\references{
Kelley, D. E., Harbin, J., & Richards, C. (2021). argoFloats: An R package for analyzing
Argo data. Frontiers in Marine Science, (8), 636922.
\doi{10.3389/fmars.2021.635922}
}
\author{
Dan Kelley and Jaimie Harbin
}
