Civis Scripts are the way to productionize your code with Civis Platform. You’ve probably used three of the four types of scripts already in the Civis Platform UI (“Code” –> “Scripts”): language (R, Python3, javascript, and sql), container, and custom. If you’ve run any of these scripts in Civis Platform, you’ve already started productionizing your code. Most loosely, productionizing means that your code now runs on a remote server instead of your local or development machine.
You probably already know some of the benefits too:
This guide will cover how to programmatically do the same tasks using the API that you are used to doing in GUI. Instead of typing in values for the parameters or clicking to download outputs, you can do the same thing in your programs. Hooray for automation!
Specifically, this guide will cover how to programmatically read outputs, kick off new script runs, and publish your own script templates to share your code with others. It will make heavy use of API functions directly, but highlight convenient wrappers for common tasks where they have been implemented already.
Ready? Buckle in!
A script is a job that executes code in Civis Platform. A script accepts user input through parameters, gives values back to the user as run outputs, and records any logs along the way.
A script author can share language and container scripts with others by letting users clone the script. But if an author makes a change to the script such as fixing a bug or adding a feature, users will have to re-clone the script to get access to those changes.
A better way to share code with others is with template scripts. A template script is a ‘published’ language or container script. The script that the template runs is the backing script of the template.
Once a container or language script is published as a template, users can create their own instances of the template. These instances are called custom scripts and they inherit all changes made to the template. This feature makes it easy to share code with others and to rapidly deploy changes and fixes.
# create a container script with a parameter
script <- scripts_post_containers(
required_resources = list(cpu = 1024, memory = 50, diskSpace = 15),
docker_command = 'cd /package_dir && Rscript inst/run_script.R',
docker_image_name = 'civisanalytics/datascience-r',
name = 'SCRIPT NAME',
params = list(
list(name = 'NAME_OF_ENV_VAR',
label = 'Name User Sees',
type = 'string',
required = TRUE)
)
)
# publish the container script as a template
template <- templates_post_scripts(script$id, name = 'TEMPLATE NAME', note = 'Markdown Docs')
# run a template script, returning file ids of run outputs
out <- run_template(template$id)
# post a file or JSONValue run output within a script
write_job_output('filename.csv')
json_values_post(jsonlite::toJSON(my_list), 'my_list.json')
# get run output file ids of a script
out <- fetch_output_file_ids(civis_script(id))
# get csv run outputs of a script
df <- read_civis(civis_script(id), regex = '.csv', using = read.csv)
# get JSONValue run outputs
my_list <- read_civis(civis_script(id))
Let’s make these concepts concrete with an example! We’ll use the ‘R’
language script throughout, but container
scripts work
exactly the same way. In the second section, we’ll cover
custom
and template
scripts.
The post
method creates the job and returns a list of
metadata about it, including its type.
Each script can be uniquely identified by its job id. If you
have a job id but don’t know what kind of script it is, you can do
jobs_get(id)
.
Each script type is associated with its own API endpoints. For
instance, to post a job of each script type, you need
scripts_post_r
, scripts_post_containers
,
scripts_post_custom
, or
templates_post_scripts
.
This job hasn’t been run yet. To kick off a run do:
run <- scripts_post_r_runs(job$id)
# check the status
scripts_get_r_runs(job$id, run$id)
# automatically poll until the job completes
await(scripts_get_r_runs, id = job$id, run_id = run$id)
Since kicking off a job and polling until it completes is a really common task for this guide, let’s make it a function:
This script isn’t very useful because it doesn’t produce any output
that we can access. To add an output to a job, we can use
scripts_post_r_runs_outputs
. The two most common types of
run outputs are Files
and JSONValues
.
We can specify adding a File
as a run output by
uploading the object to S3 with write_civis_file
and
setting object_type
in
scripts_post_r_runs_outputs
to File
. Notice
that the environment variables CIVIS_JOB_ID
and
CIVIS_RUN_ID
are automatically inserted into the
environment for us to have access to.
source <- c("
library(civis)
data(iris)
write.csv(iris, 'iris.csv')
job_id <- as.numeric(Sys.getenv('CIVIS_JOB_ID'))
run_id <- as.numeric(Sys.getenv('CIVIS_RUN_ID'))
file_id <- write_civis_file('iris.csv')
scripts_post_r_runs_outputs(job_id, run_id, object_type = 'File', object_id = file_id)
")
run <- run_script(source)
Since this pattern is so common, we replaced it with the function
write_job_output
which you can use to post a filename as a
run output for any script type.
It is best practice to make run outputs as portable as possible because the script can be called by any language. For arbitrary data, JSONValues are often the best choice. Regardless, it is user friendly to add the file extension to the name of the run output.
Adding JSONValue run outputs is common enough for it to be
implemented directly as a Civis API endpoint,
json_values_post
:
source <- c("
library(civis)
library(jsonlite)
my_farm <- list(cows = 1, ducks = list(mallard = 2, goldeneye = 1))
json_values_post(jsonlite::toJSON(my_farm), name = 'my_farm.json')
")
run_farm <- run_script(source)
To retrieve script outputs we can use
scripts_list_r_runs_outputs
:
out <- scripts_list_r_runs_outputs(run$rId, run$id)
iris <- read_civis(out$objectId, using = read.csv)
Since this pattern is also common, you can simply use
read_civis
directly. This will work for any script type.
Use regex
and using
to filter run outputs by
file extension, and provide the appropriate reading function. JSONValues
can be read automatically.
Scripts are more useful if their behavior can be configured by the
user, which can be done with script parameters. Script
parameters are placeholders for input by the user. Specific
values of the parameters input by the user are called
arguments. Here, we modify run_script
to
automatically add a parameter, and simultaneously take a value of that
parameter provided by the user. In the script itself, we can access the
parameter as an environment variable.
# Add 'params' and 'arguments' to run_script
run_script <- function(source, args, name = 'Cool') {
params <- list( # params is a list of individual parameters
list(
name = 'PET_NAME', # name of the environment variable with the user value
label = 'Pet Name', # name displaayed to the user
type = 'string', # type
required = TRUE # required?
)
)
job <- scripts_post_r(name = name,
source = source,
params = params,
arguments = args)
run <- scripts_post_r_runs(job$id)
await(scripts_get_r_runs, id = job$id, run_id = run$id)
}
# Access the PET_NAME variable
source <- c('
library(civis)
pet_name <- Sys.getenv("PET_NAME")
msg <- paste0("Hello", pet_name, "!")
print(msg)
')
# Let's run it! Here we pass the argument 'Fitzgerald' to the
# parameter 'PET_NAME' that we created.
run_script(source, name = 'Pet Greeting', args = list(PET_NAME = 'Fitzgerald'))
That’s it! Now go forth and productionize!