% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pulse_main.R
\name{PULSE_by_chunks}
\alias{PULSE_by_chunks}
\title{Process PULSE data file by file  (\verb{STEPS 1-6})}
\usage{
PULSE_by_chunks(
  folder,
  allow_dir_create = FALSE,
  chunks = 2,
  bind_data = TRUE,
  window_width_secs = 30,
  window_shift_secs = 60,
  min_data_points = 0.8,
  interpolation_freq = 40,
  bandwidth = 0.2,
  doublecheck = TRUE,
  lim_n = 3,
  lim_sd = 0.75,
  raw_v_smoothed = TRUE,
  correct = TRUE,
  discard_channels = NULL,
  keep_raw_data = TRUE,
  show_progress = TRUE
)
}
\arguments{
\item{folder}{the path to a folder where several PULSE files are stored}

\item{allow_dir_create}{logical, defaults to \code{FALSE}. Only when set to \code{TRUE} does \code{PULSE_by_chunks()} actually do anything. This is to force the user to accept that a job_folder will be created inside of the \code{folder} supplied - without this folder \code{PULSE_by_chunks()} cannot operate. It is STRONGLY advised to maintain a copy of the dataset being processed to avoid any inadvertent data loss. By setting \code{allow_dir_create} to \code{TRUE}the user is taking responsibility for the management of their files.}

\item{chunks}{numeric, defaults to \code{2}. Corresponds to the number of files processed at once during each \code{for} cycle; higher numbers result in a quicker and more efficient operation, but shouldn't be set too high, as otherwise the system may become overwhelmed once more (which is what \code{PULSE_by_chunks()} is designed to avoid).}

\item{bind_data}{logical, defaults to \code{TRUE}. If set to \code{TRUE}, after processing all chunks, \code{PULSE_by_chunks()} will try to read all files in the job_folder and return a single unified tibble with all data. Please be aware that there's a possibility that if the dataset is very large, the machine may become overwhelmed and crash due to lack of memory (still, all files stored in the job_folder will remain intact, and code may be written to analyze data also in chunks). If set to \code{FALSE}, \code{PULSE_by_chunks()} will return nothing after completing the processing of all files in the dataset, and the user must instead manually handle the reading and collating of all processed data in the job_folder.}

\item{window_width_secs}{numeric, in seconds, defaults to \code{30}; the width of the time windows over which heart rate frequency will be computed.}

\item{window_shift_secs}{numeric, in seconds, defaults to \code{60}; by how much each subsequent window is shifted from the preceding one.}

\item{min_data_points}{numeric, defaults to \code{0.8}; decimal from 0 to 1, used as a threshold to discard incomplete windows where data is missing (e.g., if the sampling frequency is \code{20} and \code{window_width_secs = 30}, each window should include \code{600} data points, and so if \code{min_data_points = 0.8}, windows with less than \code{600 * 0.8 = 480} data points will be rejected).}

\item{interpolation_freq}{numeric, defautls to \code{40}; value expressing the frequency (in Hz) to which PULSE data should be interpolated. Can be set to \code{0} (zero) or any value equal or greater than \code{40} (the default). If set to zero, no interpolation is performed.}

\item{bandwidth}{numeric, defaults to \code{0.2}; the bandwidth for the Kernel Regression Smoother. If equal to \code{0} (zero) no smoothing is applied. Normally kept low (\code{0.1 - 0.3}) so that only very high frequency noise is removed, but can be pushed up all the way to \code{1} or above (especially when the heartbeat rate is expected to be slow, as is typical of oysters, but double check the resulting data). Type \code{?ksmooth} for additional info.}

\item{doublecheck}{logical, defaults to \code{TRUE}; should \code{\link[=pulse_doublecheck]{pulse_doublecheck()}} be used? (it is rare, but there are instances when it should be disabled).}

\item{lim_n}{numeric, defaults to \code{3}; minimum number of peaks detected in each time window for it to be considered a "keep".}

\item{lim_sd}{numeric, defaults to \code{0.75}; maximum value for the sd of the time intervals between each peak detected for it to be considered a "keep"}

\item{raw_v_smoothed}{logical, defaults to \code{TRUE}; indicates whether or not to also compute heart rates before applying smoothing; this will increase the quality of the output but also double the processing time.}

\item{correct}{logical, defaults to \code{TRUE}; if \code{FALSE}, data points with \code{hz} values likely double the real value are flagged \strong{BUT NOT CORRECTED}. If \code{TRUE}, \code{hz} (as well as \code{data}, \code{n}, \code{sd} and \code{ci}) are corrected accordingly. Note that the correction is not reversible!}

\item{discard_channels}{character vectors, containing the names of channels to be discarded from the analysis. \code{discard_channels} is forced to lowercase, but other than that, the \strong{exact} names must be provided. Discarding unused channels can greatly speed the workflow!}

\item{keep_raw_data}{logical, defaults to \code{TRUE}; If set to \code{FALSE}, \verb{$data} is set to \code{FALSE} (i.e., raw data is discarded), dramatically reducing the amount of disk space required to store the final output (usually, by two orders of magnitude). HOWEVER, note that it won't be possible to use \code{pulse_plot_raw()} anymore!}

\item{show_progress}{logical, defaults to \code{FALSE}. If set to \code{TRUE}, progress messages will be provided.}
}
\value{
A tibble with nrows = (number of channels) * (number of windows in \code{pulse_data_split}) and 13 columns:
\itemize{
\item \code{i}, the order of each time window
\item \code{smoothed}, logical flagging smoothed data
\item \code{id}, PULSE channel IDs
\item \code{time}, time at the center of each time window
\item \code{data}, a list of tibbles with raw PULSE data for each combination of channel and window, with columns \code{time}, \code{val} and \code{peak} (\code{TRUE} in rows corresponding to wave peaks)
\item \code{hz}, heartbeat rate estimate (in Hz)
\item \code{n}, number of wave peaks identified
\item \code{sd}, standard deviation of the intervals between wave peaks
\item \code{ci}, confidence interval (hz ± ci)
\item \code{keep}, logical indicating whether data points meet N and SD criteria
\item \code{d_r}, ratio of consecutive asymmetric peaks
\item \code{d_f}, logical flagging data points where heart beat frequency is likely double the real value
}
}
\description{
This function runs \code{PULSE()} file by file, instead of attempting to read all files at once. This is required when datasets are too large (more than 20-30 files), as otherwise the system may become stuck due to the amount of data that needs to be kept in the memory. Because the results of processing data for each hourly file in the dataset are saved to a \code{job_folder}, \code{PULSE_by_chunks()} has the added benefit of allowing the entire job to be stopped and resumed, facilitating the advance in the processing even if a crash occurs.
}
\examples{
##
}
\seealso{
\itemize{
\item \code{\link[=PULSE]{PULSE()}} for all the relevant information about the the processing of \code{PULSE} data
}
}
