These changes invalidate certain targets in a pipeline and cause them
to rerun on the next tar_make()
.
tar_repository_cas()
output strings to reduce the size of pipeline metadata (#1390).tar_format()
output
strings to reduce the size of pipeline metadata (#1390).tar_make()
and tar_outdated()
run much
faster in this release. Extensive profiling was done on a real-world
simulation pipeline with 66002 up-to-date targets. For
tar_make()
using all the default settings:
Machine | Before (seconds) | After (seconds) | Speedup |
---|---|---|---|
M2 Macbook | 413.16 | 35.538 | 11.62587 |
RHEL9 | 450.66 | 94.08 | 4.790 |
And for tar_outdated()
using all the default
settings
Machine | Before (seconds) | After (seconds) | Speedup |
---|---|---|---|
M2 Macbook | 91.314 | 16.636 | 5.48894 |
RHEL9 | 167.809 | 37.395 | 4.487472 |
To take advantage of these speed gains for an existing pipeline, you
may have to run tar_make()
to convert the time stamps and
file sizes to a new format. This initial tar_make()
is
slow, but subsequent tar_make()
calls should be much faster
than before the upgrade.
tar_make()
and tar_outdated()
by
avoiding excessive buffering and disk writes for metadata and reporters
when the pipeline is just skipping targets.tar_runtime$file_info
(#1398)."forecast_interactive"
reporter to
tar_outdated()
to choose "forecast"
for
interactive sessions and "silent"
for non-interactive
ones.seconds_reporter_outdated
argument to
tar_config_set()
with a default of 1 to control the time
interval of the reporter of tar_outdated()
and other
passive algorithm functions.path
vectors with cloud metadata (#1382, @n8layman).ps::ps_disk_partitions()
and
ps::ps_fs_mount_point()
._targets/objects/
paths in metadata for
CAS repositories (#1391).igraph
>= 2.1.2.format = "file_fast"
(#1339, @koefoeden).error = "trim"
(#1340, @koefoeden).garbage_collection
to be a non-negative integer
to control the frequency of garbage collection in a performant,
convenient, unified way (#1351).garbage_collection
argument of
tar_make()
, tar_make_future()
, and
tar_make_clusterm()
(#1351).target_run()
, target_prepare()
,
and target_conclude()
using autometric
."vctrs_error_subscript_oob"
to rlang::abort()
(#1354, @Jiefei-Wang).store_assert_format()
and
store_convert_object()
is storage
is
"none"
.list()
method to
tar_repository_cas()
to make it easier and more efficient
to specify custom CAS repositories (#1366).memory
is
"transient"
(#1364).memory
class with the new
lookup
class.memory = "auto"
to select transient memory
for dynamic branches and persistent memory for other targets
(#1371).retrieval
is "main"
and only a bud is actually
used. The same cannot be done with branches because each branch may need
to be (un)marshaled individually.retrieval
is
"worker"
and the whole pattern is part of the
subpipeline.format = "qs"
from
qs
to qs2
(#1373).tar_unblock_process()
."keepNA"
and "keepInteger"
to
.deparseOpts()
(#1375). This may cause existing pipelines
to rerun, but it makes add-ons like tarchetypes::tar_map()
much easier to use.tar_watch()
UI module in
bslib::page()
(#1302, @kwbyron-lilly).callr_function
in tar_make_as_job()
argument list.storage = "worker"
is respected when the process
of storing an object generates an error (#1304, @multimeric)._targets.R
pattern in
tar_branches()
(#1306, @multimeric, @mattwarkentin).tar_prune()
(#1312, @benzipperer).workspace_on_error
option to
TRUE
(#1310, @hadley).error = "stop"
error
message._targets/objects
for
error = "null"
. Instead, switch to a special
"null"
storage format class if error
is
"null"
the target throws an error. This should allow users
to more freely create new formats with tar_format()
without
worrying about how to handle NULL
objects created by
error = "null"
.format = "auto"
(#1311, @hadley).pingr
dependency with
base::socketConnection()
for local URL utilities (#1317,
#1318, @Adafede).tar_repository_cas()
,
tar_repository_cas_local()
, and
tar_repository_cas_local_gc()
for content-addressable
storage (#1232, #1314, @noamross).tar_format_get()
to make implementing CAS systems
easier.error = "trim"
in tar_target()
and tar_option_set()
(#1310, #1311, @hadley).format = "file_fast"
in favor of the above
(#1315).trust_object_timestamps
in favor of the more
unified trust_timestamps
in tar_option_set()
(#1315).tar_target()
and
tar_target_raw()
. Same with tar_load()
and
tar_load_raw()
.substitute
argument to tar_format()
to make it easier to write custom storage formats without
metaprogramming.bslib
in tar_watch()
.target_upstream_edges()
and
pipeline_upstream_edges()
by avoiding data frames until the
last minute (17% speedup for certain kinds of large pipelines).as_job
to FALSE
in
tar_make()
if rstudioapi
and/or RStudio is not
available.secretbase::siphash13()
instead of
digest(algo = "xxhash64", serializationVersion = 3)
so
hashes of in-memory objects no longer depend on serialization version 3
headers (#1244, @shikokuchuo). Unfortunately, pipelines
built with earlier versions of targets
will need to
rerun.targets
and changes to the package will cause
the current work to rerun (#1244). For the tar_make*()
functions, utils::menu()
prompts the user to give people a
chance to downgrade if necessary.data.table::fread()
, then convert them
to the correct types afterwards.tar_resources_custom_format()
function which
can pass environment variables to customize the behavior of custom
tar_format()
storage formats (#1263, #1232, @Aariq, @noamross).extras
in tar_renv()
.tar_target()
gains a description
argument
for free-form text describing what the target is about (#1230, #1235,
#1236, @tjmahr).tar_visnetwork()
, tar_glimpse()
,
tar_network()
, tar_mermaid()
, and
tar_manifest()
now optionally show target descriptions
(#1230, #1235, #1236, @tjmahr).tar_described_as()
is a new wrapper around
tidyselect::any_of()
to select specific subsets of targets
based on the description rather than the name (#1136, #1196, @noamross, @mattmoo).names
argument (nudge
users toward tidyselect
expressions).arrow
-related CRAN check NOTE.use_targets()
only writes the _targets.R
script. The run.sh
and run.R
scripts are
superseded by the as_job
argument of
tar_make()
. Users not using the RStudio IDE can call
tar_make()
with callr_function = callr::r_bg
to run the pipeline as a background process.
tar_make_clustermq()
and tar_make_future()
are
superseded in favor tar_make(use_crew = TRUE)
, so template
files are no longer written for the former automatically.Because of the changes below, upgrading to this version of
targets
will unavoidably invalidate previously built
targets in existing pipelines. Your pipeline code should still work, but
any targets you ran before will most likely need to rerun after the
upgrade.
tar_seed_create()
, use
secretbase::sha3(x = TARGET_NAME, bits = 32L, convert = NA)
to generate target seeds that are more resistant to overlapping RNG
streams (#1139, @shikokuchuo). The previous approach
used a less rigorous combination of
digest::digest(algo = "sha512")
and
digets::digest2int()
.deployment
argument of
tar_target()
to reflect the advent of crew
(#1208, @psychelzh).cli.num_colors
on exit in
tar_error()
and tar_warning()
(#1210, @dipterix).seconds_timeout
if the
crew
controller is actually a controller group (#1207,
https://github.com/wlandau/crew.cluster/discussions/35, @stemangiola, @drejom).tar_make()
gains an as_job
argument to
optionally run a targets
pipeline as an RStudio job.igraph
version to 2.0.0 because
igraph::get.edgelist()
was deprecated in favor of
igraph::as_edgelist()
.crew
controllers
(or controller groups) (#1220). Use the new push_backlog()
and pop_backlog()
crew
methods to make this
smooth.tar_make()
if
there is already a targets
pipeline running on a local
process on the same local data store. The local process is detected
using the process ID and time stamp from tar_process()
(with a 1.01-second tolerance for the time stamp).pkgload::load_all()
warning (#1218). Tried using
.__DEVTOOLS__
but it interferes with reverse
dependencies.tar_target_raw()
to let users know that iteration = "group"
is invalid for
dynamic targets (ones with pattern = map(...)
etc.; #1226,
@bmfazio).clustermq
version to 0.9.2.tar_debug_instructions()
tips for when
commands are long.Because of the changes below, upgrading to this version of
targets
will unavoidably invalidate previously built
targets in existing pipelines. Your pipeline code should still work, but
any targets you ran before will most likely need to rerun after the
upgrade.
tar_seed_create()
help file for
details and justification. Unfortunately, this change will invalidate
all currently built targets because the seeds will be different. To
avoid rerunning your whole pipeline, set
cue = tar_cue(seed = FALSE)
in
tar_target()
.targets:::digest_chr64()
in both cases before
storing the result in the metadata.targets
now tries to ensure that the up-to-date data
objects in the cloud are in their newest versions. So if you roll back
the metadata to an older version, you will still be able to access
historical data versions with e.g. tar_read()
, but the
pipeline will no longer be up to date.tar_seed_create()
which
creates target-specific pseudo-random number generator seeds.tar_seed_create()
help file to justify and defend how targets
and
tarchetypes
approach pseudo-random numbers.tar_seed_set()
which sets a seed and sets
all the RNG algorithms to their defaults in the R installation of the
user. Each target now uses tar_seed_set()
function to set
its seed before running its R command (#1139).tar_seed()
in favor of the new
tar_seed_get()
function.tar_delete()
,
tar_destroy()
, and tar_prune()
now use
efficient batched calls to delete_objects()
instead of
costly individual calls to delete_object()
(#1171).verbose
argument to
tar_delete()
, tar_destroy()
, and
tar_prune()
.batch_size
argument to
tar_delete()
, tar_destroy()
, and
tar_prune()
.page_size
and verbose
to
tar_resources_aws()
(#1172).tar_unversion()
function to remove version
IDs from the metadata of cloud targets. This makes it easier to interact
with just the current version of each target, as opposed to the version
ID recorded in the local metadata.clustermq
0.9.0 (@mschubert).tar_started()
in favor of
tar_dispatched()
(#1192).tar_built()
in favor of
tar_completed()
(#1192).crew
scheduling algorithm no longer waits on
saturated controllers, and targets that are ready are greedily
dispatched to crew
even if all workers are busy (#1182,
#1192). To appropriately set expectations for users, reporters print
“dispatched (pending)” instead of “dispatched” if the task load is
backlogged at the moment.crew
scheduling algorithm, waiting for tasks is
now a truly event-driven process and consumes 5-10x less CPU resources
(#1183). Only the auto-scaling of workers uses polling (with an
inexpensive default polling interval of 0.5 seconds, configurable
through seconds_interval
in the controller).tar_config_projects()
and
tar_config_yaml()
(#1153, @psychelzh).builder_wait_correct_hash()
in
target_conclude.tar_builder()
(#1154, @gadenbuie).builder_error_null()
.tar_meta_upload()
and
tar_meta_download()
to avoid errors if one or more metadata
files do not exist. Add a new argument strict
to control
error behavior.meta
, progress
,
process
, and crew
to control individual
metadata files in tar_meta_upload()
,
tar_meta_download()
, tar_meta_sync()
, and
tar_meta_delete()
.crew
0.5.0.9003 (https://github.com/wlnadau/crew/issues/131).tar_read()
etc. inside a pipeline whenever it
uses a different data store (#1158, @MilesMcBain).seed = FALSE
in future::future()
(#1166, @svraka).physics
argument to
tar_visnetwork()
and tar_glimpse()
(#925,
@Bdblodgett-usgs).Because of these changes, upgrading to this version of
targets
will unavoidably invalidate previously built
targets in existing pipelines. Your pipeline code should still work, but
any targets you ran before will most likely need to rerun after the
upgrade.
hash_deps()
method of the metadata class,
exclude symbols which are not actually dependencies, rather than just
giving them empty strings. This change decouples the dependency hash
from the hash of the target’s command (#1108).tar_make()
, tar_make_clustermq()
, and
tar_make_future()
(#1109). Upload them to the repository
specified in the repository_meta
tar_option_set()
option, and use the bucket and prefix set
in the resources
tar_option_set()
option.
repository_meta
defaults to the existing
repository
tar_option_set()
option.tar_meta_download()
,
tar_meta_upload()
, tar_meta_sync()
, and
tar_meta_delete()
to directly manage cloud metadata outside
the pipeline (#1109).tempdir()
for #1103.path_scratch_dir_network()
to
file.path(tempdir(), "targets")
and make sure
tar_destroy("all")
and tar_destroy("cloud")
delete it.tar_mermaid()
subgraphs with transparent fills
and black borders.database$get_data()
to work with list
columns.tarchetypes
literate programming target factories like
tar_render()
and tar_quarto()
.hash_deps()
method of the metadata class, use a
new custom sort_chr()
function which temporarily sets the
LC_COLLATE
locale to "C"
for sorting. This
ensures lexicographic comparisons are consistent across platforms
(#1108).tar_source()
, use the file
argument and
keep.source = TRUE
to help with interactive debugging
(#1120).seconds_interval
in
tar_config_get()
, tar_make()
,
tar_make_clustermq()
and tar_make_future()
.
Replace it with seconds_meta
(to control how often metadata
gets saved) and seconds_reporter
(to control how often to
print messages to the R console) (#1119).seconds_meta
and seconds_reporter
for writing metadata and console messages even for currently building
targets (#1055).googleAuthR
(#1112).format = "url"
, only retry on the HTTP error codes
above.seconds_interval
and
seconds_timeout
from tar_resources_url()
, and
implement max_tries
arguments in
tar_resources_aws()
and tar_resources_gcp()
(#1127).file
and keep.source
in
parse()
in callr
utils and target
Markdown."file_fast"
format to
"file"
format for cloud targets.tar_prune()
and tar_delete()
, do not
try to delete pattern targets which have no cloud storage.seconds_timeout
,
close_connection
, s3_force_path_style
to
tar_resources_aws()
to support the analogous arguments in
paws.storage::s3()
(#1134, @snowpong).tar_prune_list()
(#1090, @mglev1n).file.rename()
in tryCatch()
and fall
back on a copy-then-remove workaround (@jds485, #1102, #1103).tools::R_user_dir(package = "targets", which = "cache")
instead of tempdir()
.
tar_destroy(destroy = "cloud")
and
tar_destroy(destroy = "all")
remove any leftover files from
failed uploads/downloads (@jds485, #1102, #1103).paws.storage
instead of all of
paws
.crew
integrationcrew
controllers._targets.R
file from
use_targets()
.tar_crew()
compatible with crew
>=
0.3.0.terminate
to
terminate_controller
in tar_make()
.use_crew
in tar_make()
and
add an option in tar_config_set()
to make it
configurable.target_prepare()
.label
and
level_separation
arguments through
tar_config_set()
(#1085, @Moohan).nanonext
usage in
time_seconds_local()
at runtime and not installation time.
That way, if nanonext
is removed after targets
is installed, functions in targets
still work. Fixes the
CRAN issues seen in tarchetypes
, jagstargets
,
and gittargets
.crew
-related startup messages.cli
colors and bullets to improve
performance in RStudio.packageStartupMessage()
for package startup
messages.crew
is used.gc()
more appropriately when
garbage_collection
is TRUE
in
tar_target()
.garbage_collection
arguments to
tar_make()
, tar_make_clustermq()
, and
tar_make_future()
to add optional garbage collection before
targets are sent to workers. This is different and independent from the
garbage_collection
argument of tar_target()
.
In high-performance computing scenarios, the former controls what
happens on the main controlling process, whereas the latter controls
what happens on the worker.garbage_collection
and
seconds_interval
arguments to tar_make()
,
tar_make_clustermq()
, tar_make_future()
, and
tar_config_set()
.tar_runtime
object."file_fast"
format and the
trust_object_timestamps
option in
tar_option_set()
as safer alternatives.crew
controller groups (#1065, @mglev1n).tar_backoff()
. The backoff
argument of
tar_option_set()
now accepts output from
tar_backoff()
, and supplying a numeric is deprecated.crew
scheduling algorithm.tar_resources_network()
to configure retries
and timeouts for internal HTTP/HTTPS requests in specialized targets
with format = "url"
, repository = "aws"
, and
repository = "gcp"
. Also applies to syncing target files
across network file systems in the case of
storage = "worker"
or format = "file"
, which
previously had a hard-coded seconds_interval = 0.1
and
seconds_timeout = 60
.seconds_interval
and
seconds_timeout
in tar_resources_url()
in
favor of the new equivalent arguments of
tar_resources_network()
crew
controller when
the controller is saturated (#1074, @mglev1n).crew
controller.paws.common
(@DyfanJones)._targets/objects/
in
tar_callr_inner_try()
and update the cache as targets are
saved to _targets/objects/
to avoid the overhead of
repeated calls to file.exists()
and
file.info()
(#1056)._targets/objects/
are up to date (#1062).
tar_option_set(trust_object_timestamps = FALSE)
ignores the
timestamps and recomputes the hashes._targets/meta/meta
and
_targets/meta/progress
in timed batches instead of line by
line (#1055).tempfile()
when working with the scratch
directory.nanonext::mclock()
instead of
proc.time()
when there is no risk of forked processes.withr
with slightly faster/leaner base R
alternatives.setwd()
(#1057).tar_options
methods in the internals instead of
tar_option_get()
.gsub()
in store_init()
.meta$get_record()
in
builder_should_run()
.cli::col_none()
to reduce the number of ANSI
characters printed to the R console.targets
is moving to version 1.0.0 because it is
significantly more mature than previous versions. Specifically,
tar_make()
now integrates with crew
, which
will significantly improve the way targets
does
high-performance computing going forward.targets
has stabilized.
There is still room for smaller new features, but none as large as
crew
integration, none that will fundamentally change how
the package operates.crew
package
in tar_make()
(#753). crew
itself is still in
its early stages and currently lacks the launcher plugins to match the
clustermq
and future
backends, but long-term,
crew
will be the predominant high-performance computing
backend.store_copy_object()
to the store class to
enable "fst_dt"
and other formats to make deep copies when
needed (#1041, @MilesMcBain).copy
argument to allow
tar_format()
formats to set the
store_copy_object()
method (#1041, @MilesMcBain).tar_format()
when
default methods are used.change_directory
argument to
tar_source()
(#1040, @dipterix).format = "url"
targets, implement retries and
timeouts when connecting to URLs. The default timeout is 10 seconds, and
the default retry interval is 1 second. Both are configurable via
tar_resources_url()
(#1048).parallelly::freePort()
in
tar_random_port()
.tar_script()
example pipeline (#1033, @b-rodrigues).tar_destroy()
help file (#988, @Sage0614).destroy = "user"
in
tar_destroy()
.#!/bin/sh
line to the top of SLURM
clustermq
template file (#944, #955, @GiuseppeTT).tar_path_script()
.tar_store()
to tar_path_store()
with deprecation.tar_path()
to tar_path_target()
with deprecation.tar_path_script_support()
.tar_option_set()
now supports a seed
argument,
and target-specific seeds are determined by
tar_option_get("seed")
and the target name.
tar_option_set(seed = NA)
disables seed-setting behavior
but forcibly invalidates all the affected targets except when
seed
is FALSE
in the target’s
tar_cue()
(#882, @sworland-thyme, @joelnitta).seed
argument in tar_cue()
to
control whether targets update in response to changing or
NA
seeds (#882, @sworland-thyme, @joelnitta).tar_github_actions()
workflow file to use
@v2
(#960, @kulinar).callr_function
is
NULL
(#961)."feather"
, "parquet"
,
"file"
, and "url"
work with
error = "null"
(#969)."keras"
and "torch"
superseded by tar_format()
. Documented in the
tar_target()
help file."keras"
and "torch"
incompatible with error = "null"
. Documented in the
tar_target()
help file and in a warning thrown by
tar_target()
via tar_target_raw()
.convert
argument to tar_format()
to
allow custom store_convert_object()
methods (#970).any_of()
instead of all_of()
in tests
to ensure compatibility with tidyselect
1.1.2.9000 (#928,
@hadley).run.R
from use_targets()
executable (#929, @petrbouchal).#!/usr/bin/env Rscript
to the top of
run.R
from use_targets()
(#929, @petrbouchal).skip_on_cran()
to avoid
https://github.com/r-lib/testthat/issues/1470#issuecomment-1248145555.names
argument of
tar_make()
does not identify any such targets in the
pipeline (#923, @llrs)..packageName
, .__NAMESPACE__.
, and
.__S3MethodsTable__.
when importing objects from packages
with the imports
option of
tar_option_set()
.imports
option of
tar_option_set()
(#926, @joelnitta).tar_read()
and
tar_load()
when the data store is missing.command
column of tar_manifest()
output, separate lines with “” instead of “\n” so the text output is
straightforward to work with.drop_missing
argument to
tar_manifest()
to hide/show columns with all
NA
values.paws
functions via
...
in tar_resources_aws()
(#855, @michkam89).tar_source()
to conveniently source R scripts
(e.g. in _targets.R
).targets
messages the default theme
color, and color warnings and errors red (#856, @gorkang).use_targets()
.tar_option_get("resources")
(#892). See the revised
"Resources"
section of the tar_resources()
help file for details.legend
and color
to further
configure tar_mermaid()
(#848, @noamross).use_targets()
now creates a job.sh
script to run the pipeline as a
cluster job (#839).use_targets()
. Avoids
defining a global variable for the file.use_targets()
_targets.R
file.tar_mermaid()
graph ordering.tar_mermaid()
graphs to avoid JavaScript keywords.data.table::fread()
with encoding equal to
getOption("encoding")
if available (#814, @svraka). Only works with
UTF-8 and latin1 because that is what data.table
supports.use_targets()
now writes a _targets.R
file
tailored to the project in the current working directory (#639, @noamross).use_targets()
to
use_targets_rmd()
.getOption("OutDec")
is not
"."
to prevent time stamps from being corrupted (#433,
@jarauh).tar_load_everything()
to quickly
load all targets (#823, @malcolmbarrett)tar_target(..., repository = "gcp")
(#720, @markedmondson1234). Special
thanks to @markedmondson1234 for the cloud
storage utilities in R/utils_gcp.R
mermaid.js
static graphs with
tar_mermaid()
(#775, @yonicd).tar_target(..., error = "null")
to allow
errored targets to return NULL
and continue (#807, @zoews). Errors are still
registered, those targets are not up to date, and downstream targets
have an easier time continuing on.tar_assert_finite()
.tar_destroy()
, tar_delete()
, and
tar_prune()
now attempt to delete cloud data for the
appropriate targets (#799). In addition,
tar_exist_objects()
and tar_objects()
now
report about target data in the cloud when applicable. Add a new
cloud
argument to each function to optionally suppress this
new behavior.zoom_speed
argument to
tar_visnetwork()
and tar_glimpse()
(#749,
@dipterix)."verbose"
, "verbose_positives"
,
"timestamp"
, and "timesamp_positives"
reporters."aws_*"
storage format values in favor of
a new repository
argument (#803). In other words,
tar_target(..., format = "aws_qs")
is now
tar_target(..., format = "qs", repository = "aws")
. And
internally, storage classes with multiple inheritance are created
dynamically as opposed to having hard-coded source files. All this paves
the way to add new cloud storage platforms without combinatorial
chaos."tar_nonexportable"
to
format = "aws_keras"
and format = "aws_torch"
stores.tar_make_interactive_load_target()
.tar_target(format = tar_format(...))
(#736).tar_call()
to return the
targets
function currently running (from
_targets.R
or a target).tar_active()
to tell whether the
pipeline is currently running. Detects if it is called from
tar_make()
or similar function.Sys.getenv("TAR_PROJECT")
to the output of
tar_envvars()
.store
field of tar_runtime
prior
to sourcing _targets.R
so tar_store()
works in
target scripts.tar_envvars()
to targets run on parallel workers.format = "file"
targets to return
character(0)
(#728, @programLyrique).git checkout
a
different branch of your code and all you targets will stay up to
date.paws
(#711).region
argument to
tar_resources_aws()
to allow the user to explicitly declare
a region for each AWS S3 buckets (@caewok, #681). Different buckets can now
have different regions. This feature required modifying the metadata
path for AWS storage formats. Before, the first element of the path was
simply the bucket name. Now, it is internally formatted like
"bucket=BUCKET:region=REGION"
, where BUCKET
is
the user-supplied bucket name and REGION
is the
user-supplied region name. The new targets
is
back-compatible with the old metadata format, but if you run the
pipeline with targets
>= 0.8.1.9000 and then downgrade
to targets
<= 0.8.1, any AWS targets will break.timestamp_positives"
and
"verbose_positives"
that omit messages for skipped targets
(@psanker,
#683).tar_assert_file()
.tar_reprex()
for creating easier reproducible
examples of pipelines.tar_store()
to get the path to the store of
the currently running pipeline (#714, @MilesMcBain)._targets/user/
folder to
encourage gittargets
users to put custom files there for
data version control.tar_path()
uses the current store path of the
currently running pipeline instead of
tar_config_get("store")
(#714, @MilesMcBain)..gitignore
file inside the data
store to allow the metadata to be committed to version control more
easily (#685, #711).tar_target()
and
tar_target_raw()
(@tjmahr, #679).target_should_run.tar_builder()
. These kinds of errors
sometimes come up with AWS storage._targets/.gitignore
for new data stores so
the user can delete the .gitignore
file without it
mysteriously reappearing (#685).strict
and silent
to allow
tar_load()
and tar_load_raw()
to bypass
targets that cannot be loaded.tidyselect
docs in tar_make()
(#640, @dewoller).tar_dir()
in
tar_test()
(#642, @billdenney).tar_assert_target_list()
error message (@kkami1115, #654).tar_destroy()
and related cleanup
functions (@billdenney, #675).tar_target(target_name, ..., format = "aws_file")
.
Previously, _targets/objects/target_name
was also hashed if
it existed.tar_config_unset()
function to delete
one or more configuration settings from the YAML configuration
file.TAR_CONFIG
environment variable to set
the default file path of the YAML configuration file with project
settings (#622, @yyzeng, @atusy, @nsheff, @wdkrnls). If TAR_CONFIG
is not
set, the file path is still _targets.yaml
.config
package) and support the TAR_PROJECT
environment variable to select the current active project for a given R
session. The old single-project format is gracefully deprecated (#622,
@yyzeng, @atusy, @nsheff, @wdkrnls).retrieval = "none"
and
storage = "none"
to anticipate loading/saving targets from
other languages, e.g. Julia (@MilesMcBain).tar_definition()
function to get the target
definition object of the current target while that target is running in
a pipeline.tar_path()
now returns
the path to the staging file instead of
_targets/objects/target_name
. This ensures you can still
write to tar_path()
in storage = "none"
targets and the package will automatically hash the right file and
upload it to the cloud. (This behavior does not apply to formats
"file"
and "aws_file"
, where it is never
necessary to set storage = "none"
.)eval(parse(text = ...), envir = tar_option_set("envir")
instead of source()
in the _targets.R
file for
Target Markdown.RecordBatch
and Table
(@MilesMcBain).knitr
load the Target Markdown engine (#469, @nviets, @yihui). Minimum
knitr
version is now 1.34
.tar_resources_future()
help file, encourage the
use of plan
to specify resources.error = "continue"
does not cause errored
targets to have NULL
values.knitr
engine).poll_connection
, stdout
, and
stderr
arguments of callr::r_bg()
in
tar_watch()
(@mpadge).tar_started()
, tar_skipped()
,
tar_built()
, tar_canceled()
, and
tar_errored()
.tar_interactive()
,
tar_noninteractive()
, and tar_toggle()
to
differentially suppress code in non-interactive and interactive mode in
Target Markdown (#607, @33Vito).future
errors within targets (#570, @stuvet).message
knitr
chunk option is
FALSE
(#574, @jmbuhr).tar_interactive
is not set,
choose interactive vs non-interactive mode based on
isTRUE(getOption("knitr.in.progress"))
instead of
interactive()
.tar_poll()
to lose and then regain connection to
the progress file.tar_group
column of
iteration = "group"
data frames do not invalidate slices
(#507, @lindsayplatt).tar_interactive
global
option to select interactive mode or non-interactive mode (#469).degree_from
and
degree_to
of tar_visnetwork()
and
tar_glimpse()
(#474, @rgayler).tar_config_set()
(#476).tar_script
chunk option in Target Markdown to
control where the {targets}
language engine writes the
target script and helper scripts (#478).script
and store
to
choose custom paths to the target script file and data store for
individual function calls (#477).targets
backends. Unavoidably, the path gets reset to
_targets.yaml
when the session restarts._targets.yaml
config options
reporter_make
, reporter_outdated
, and
workers
to control function argument defaults shared across
multiple functions called outside _targets.R
(#498, @ianeveperry).tar_load_globals()
for debugging, testing,
prototyping, and teaching (#496, @malcolmbarrett).resources
argument of
tar_target()
to avoid conflicts among formats and HPC
backends (#489). Includes user-side helper functions like
tar_resources()
and tar_resources_aws()
to
build the required data structures._targets/meta/progress
and
display then in tar_progress()
, tar_poll()
,
tar_watch()
, tar_progress_branches()
,
tar_progress_summary()
, and tar_visnetwork()
(#514). Instead of writing each skip line separately to
_targets/meta/progress
, accumulate skip lines in a queue
and then write them all out in bulk when something interesting happens.
This avoids a lot of overhead in certain cases.shortcut
argument to tar_make()
,
tar_make_clustermq()
, tar_make_future()
,
tar_outdated()
, and tar_sitrep()
to more
efficiently skip parts of the pipeline (#522, #523, @jennysjaarda, @MilesMcBain, @kendonB).names
and shortcut
in graph data
frames and graph visuals (#529).allow
and exclude
to the network
behind the graph visuals rather than the visuals themselves (#529).tar_watch()
app to
show verbose progress info and metadata.workspace_on_error
argument of
tar_option_set()
to supersede
error = "workspace"
. Helps control workspace behavior
independently of the error
argument of
tar_target()
(#405, #533, #534, @mattwarkentin, @xinstein).error = "abridge"
in
tar_target()
and related functions. If a target errors out
with this option, the target itself stops, any currently running targets
keeps, and no new targets launch after that (#533, #534, @xinstein).tar_destroy()
which can be
suppressed with TAR_ASK = "false"
(#542, @gofford).tar_older()
and
tar_newer()
to help users identify and invalidate targets
at regular times or intervals.targets
chunk option
in favor of tar_globals
(#469).error = "workspace"
in
tar_target()
and related functions. Use
tar_option_set(workspace_on_error = TRUE)
instead (#405,
#533, @mattwarkentin, @xinstein).clustermq
worker (@rich-payne).store_sync_file_meta.default()
on small files.tar_watch()
, take several measures to avoid long
computation times rendering the graph:
display
and displays
to
tar_watch()
so the user can select which display shows
first."summary"
the default display instead of
"graph"
.outdated
to FALSE
by default.tar_read()
for targets with
format = "aws_file"
, download the file back to the path the
user originally saved it when the target ran.TAR_MAKE_REPORTER
environment variable with
targets::tar_config_get("reporter_make")
.eval(parse(text = readLines("_targets.R")), envir = some_envir)
and related techniques instead of the less controllable
source()
. Expose an envir
argument to many
functions for further control over evaluation if
callr_function
is NULL
.out.attrs
when hashing groups of data frames to
extend #507 to expand.grid()
(#508).targets
.GITHUBPAT
to GITHUB_TOKEN
in the
tar_github_actions()
YAML file (#554, @eveyp).eval
chunk option in Target Markdown (#552,
@fkohrt).time
column for all
builder targets, regardless of storage format._targets.yaml
to
parallel workers.exclude
argument to
tar_watch()
and tar_watch_server()
(#458,
@gorkang)..gitignore
file to ignore everything in
_targets/meta/
except .gitignore
and
_targets/meta/meta
.knitr
engines for pipeline
construction and prototyping from within literate programming documents
(#469, @cderv, @nviets, @emilyriederer, @ijlyttle, @GShotwell, @gadenbuie, @tomsing1). Huge thanks to
@cderv on this one for
answering my deluge of questions, helping me figure out what was and was
not possible in knitr
, and ultimately circling me back to a
successful approach.use_targets()
, which writes the Target
Markdown template to the project root (#469).tar_unscript()
to clean up scripts written by
Target Markdown.tar_make()
and
tar_manifest()
.pattern = slice()
or pattern = sample()
are
invalid.tar_target_raw()
, assert that commands have length 1
when converted to expressions.tar_cue()
(@maelle).dplyr
groups and "grouped_df"
class
in tar_group()
(tarchetypes
discussion #53,
@kendonB).tar_read()
and tar_read_raw()
._targets.yaml
). Fixes CRAN check errors from version
0.4.1.roxygen2
docstrings from
shiny
.Suggests:
packages.targets.yaml
in
the callr
process.file.rename()
errors when migrating staged
temporary files (#410).assert_df()
from
store_assert_format()
instead of
store_cast_object()
. And now those last two functions are
not called at all if the target throws an error.tar_poll()
at the same time as the pipeline (#393).tar_renv()
to
_targets_packages.R
(#397).outdated = FALSE
in tar_visnetwork()
.tar_timestamp()
and
tar_timestamp_raw()
to get the last modified timestamp of a
target’s data (#378).tar_progress_summary()
to compactly summarize
all pipeline progress (#380).characters
argument of
tar_traceback()
to cap the traceback line lengths
(#383).tar_watch()
(#382).tar_poll()
to repeatedly poll runtime
progress in the R console (#381). tar_poll()
is a
lightweight alternative to tar_watch()
.tar_envvar()
function to list values of special
environment variables supported in targets
. The help file
explains each environment variable in detail._targets.yaml
(#297). New functions
tar_config_get()
and tar_config_set()
interact
with the _targets.yaml
file. Currently only supports the
store
field to set the data store path to something other
than _targets/
.deployment = "main"
(#398, #399, #404, @pat-s).tar_traceback()
(#383).tar_watch()
, use shinybusy
instead of
shinycssloaders
and keep current output on display while
new output is rendering (#386, @rcorty).AWS_DEFAULT_REGION
environment variable
(check_region = TRUE
; #400, @tomsing1).tar_meta()
, return POSIXct
times in the
time zone of the calling system (#131).qs::qread()
now that
qs
0.24.1 requires stringfish
>= 1.5.0
(#147, @glep).pattern = slice(...)
can take multiple indexes (#406, #419, @djbirke, @alexgphayes)queue$enqueue()
is now queue$prepend()
and
always appends to the front of the queue (#371).devtools::load_all()
or similar is
detected inside _targets.R
(#374).feather
and parquet
tests on
CRAN.backoff
option in tar_option_set()
to set the maximum upper bound (seconds) for the polling interval
(#333).tar_github_actions()
function to write a
GitHub Actions workflow file for continuous deployment of data analysis
pipelines (#339, @jaredlander).TAR_MAKE_REPORTER
environment variable to
globally set the reporter of the tar_make*()
functions
(#345, @alexpghayes).tar_make_clustermq()
and
tar_make_future()
(#333).tar_make_future()
, try to submit a target every time
a worker is polled.tar_make_future()
, poll workers in order of target
priority.targets
internal objects out of the environment in order to
avoid accidental massive data transfers to workers.rlang::check_installed()
inside
assert_package()
(#331, @malcolmbarrett).tar_destroy(destroy = "process")
.tar_watch()
, increase default seconds
to 15 (previously 5).tar_watch()
, debounce instead of throttle
inputs.tar_watch()
, add an action button to refresh the
outputs.tar_make()
. Will help
compute a cache key on GitHub Actions and similar services.tar_deduplicate()
due to the item above.tar_target_raw()
,
tar_meta()
, and tar_seed()
(#357, @alexpghayes).%||%
and %|||%
to
conform to historical precedent.reporter = "silent"
(#364, @matthiasgomolka).envir
element.tar_load()
, subset metadata to avoid accidental
attempts to load global objects in tidyselect
calls.vctrs::vec_c()
(#320, @joelnitta).names
argument to tar_objects()
and tar_workspaces()
with tidyselect
functionality.targets
version) in _targets/meta/process
and
write new functions tar_process()
and
tar_pid()
to retrieve the data (#291, #292).targets_only
argument to
tar_meta()
.tar_helper()
and
tar_helper_raw()
to write general-purpose R scripts, using
tidy evaluation for as a template mechanism (#290, #291, #292,
#306).tar_exist_meta()
,
tar_exist_objects()
, tar_exist_progress()
,
tar_exist_progress()
, tar_exist_script()
(#310).supervise
argument to
tar_watch()
.complete_only
argument to
tar_meta()
to optionally return only complete rows (no
NA
values).callr
errors and refer users to the debugging
chapter of the manual.crayon
if an only if the
calling process is interactive (#302, @ginolhac). Can still be disabled with
options(crayon.enabled = FALSE)
in
_targets.R
.format = "url"
when the HTTP response status code is not 200 (#303, @petrbouchal).extras
packages to tar_renv()
(to
support tar_watch()
).tar_watch()
if _targets.R
does not exist.names
argument of
tar_load()
(#314, @jameelalsalam).nobody
in custom curl
handles (#315, @riazarbi).targets
is somehow actively
monitoring each job, e.g. through a connection or heartbeat (#318).errormode = "warn"
in getVDigest()
for
files to work around https://github.com/eddelbuettel/digest/issues/49
for network drives on Windows. targets
already runs those
file checks anyway. (#316, @boshek).targets
tried to load from.tar_test()
now skips all tests on Solaris in order to
fix the problems shown on the CRAN check page.allow
and exclude
to work on
imports in tar_visnetwork()
and
tar_glimpse()
.visNetwork
legends on right to avoid crowding the
graph.force()
on subpipeline objects to eliminate
high-memory promises in target objects. Allows targets to be deployed to
workers much faster when retreival
is "main"
(#279).tar_watch()
app to tabulate
progress on dynamic branches (#273, @mattwarkentin).type
, parent
, and
branches
in progress data for tar_watch()
(#273, @mattwarkentin).fields
argument in tar_progress()
and default to "progress"
for back compatibility (#273,
@mattwarkentin).tar_progress_branches()
function to tabulate
branch progress (#273, @mattwarkentin).tar_watch()
to toggle
automatic refreshing and force a refresh..Random.seed
by default in
tar_visnetwork()
.tar_watch()
app.clustermq
tests on Solaris.if(FALSE)
blocks from help files to fix
“unexecutable code” warnings (tar_glimpse()
,
tar_visnetwork()
, and tar_watch()
).tar_edit()
,
tar_watch_ui()
, and tar_watch_server()
).tar_workspace()
.)CITATION
._targets.R
(#253).tar_pipeline()
and tar_bind()
because of the above (#253).visNetwork
stabilization (#264, @mattwarkentin).visNetwork
font size.error
is
"continue"
(#267, @liutiming).tar_bind()
(#245, @yonicd).igraph
topological
sort.tar_manifest()
(#263,
@sctyner).workspaces
argument to
tar_option_set()
to specify which targets will save their
workspace files during tar_make()
(#214).error = "save"
to
error = "workspace"
to so it is clearer that saving
workspaces no longer duplicates data (#214).what
to destroy
in
tar_destroy()
.tar_undebug()
because is redundant with
tar_destroy(destroy = "workspaces")
.head()
,
tail()
, and sample()
to provide functionality
equivalent to drake
’s max_expand
(#56).tar_pattern()
function to emulate dynamic
branching outside a pipeline.level_separation
argument to
tar_visnetwork()
and tar_glimpse()
to control
the aspect ratio (#226).imports
argument to tar_option_set()
(#239).outdated
is
FALSE
in tar_visnetwork()
.tar_visnetwork()
to try to account for
color blindness.tar_manifest()
.tar_renv()
now invokes _targets.R
through
a background process just like tar_outdated()
etc. so it
can account for more hidden packages (#224, @mattwarkentin).deployment
equal to "main"
for all
targets in tar_make()
. This ensures tar_make()
does not waste time waiting for nonexistent files to ship over a
nonexistent network file system (NFS). tar_make_clustermq()
or tar_make_future()
could use NFS, so they still leave
deployment
alone.size
field to the metadata to allow
targets
to make better judgments about when to rehash files
(#180). We now compare hashes to check file size differences instead of
doing messy floating point comparisons with ad hoc tolerances. It breaks
back compatibility with old projects, but the error message is
informative, and this is all still before the first official
release.storage
, retrieval
, and
deployment
settings (#183, @mattwarkentin).garbage_collection
to a target-level setting,
i.e. argument to tar_target()
and
tar_option_set()
(#194). Previously was an argument to the
tar_make*()
functions.tar_name()
and tar_path()
to run
outside the pipeline with debugging-friendly default return values.storage
is "remote"
(#182, @mattwarkentin).target$subpipeline
rather than
target$cache
to make that happen (#209, @mattwarkentin).tar_bind()
to combine pipeline
objects.tar_seed()
to get the random number generator seed
of the target currently running.future::plan()
s through the
resources
argument of tar_target()
(#198,
@mattwarkentin).library()
instead of require()
in
command_load_packages()
.targets$cache$targets$envir
to improve convenience in
interactive debugging (ls()
just works now.) This is
reasonably safe now that the cache is populated at the last minute and
cleared as soon as possible (#209, #210).