mirai workflows

crew is an all-inclusive wrapper around mirai to manage workers and tasks from one place. However, you can let mirai manage the tasks and just use crew to manage workers. With crew’s customizable launcher plugins system, along with the pre-built plugins in crew.aws.batch and crew.cluster, you can deploy your mirai tasks to a wide range of computing environments.

How it works

First, create a crew controller with “default” compute profile and seconds_idle = Inf.1

library(crew)
controller <- crew_controller_local(profile = "default", seconds_idle = Inf)

Next, launch one or more workers.

controller$launch(n = 1)

Submit a mirai task normally.2

library(mirai)
task <- mirai(1 + 1)

The task will start as soon as the worker connects to the controller. When the task completes, you can get the result with:

task$data
#> [1] 2

Below, a “completed” count greater than zero confirms that the task actually ran on the controller.3

controller$client$status()
#> connections  cumulative    awaiting   executing   completed 
#>           1           1           0           0           1 

To stop the workers, either close the local R session or terminate the controller.

controller$terminate()

Parallel functional programming

The pattern is the same with mirai-powered parallel purrr. First, create the controller and launch the workers.

library(crew)
controller <- crew_controller_local(profile = "default", seconds_idle = Inf)
controller$launch(n = 4)

Then, use the controller’s compute profile in mirai’s parallel purrr functions.

library(purrr)
seq_len(4) |> map(in_parallel(\(x) Sys.sleep(1))) # Takes 1 second to run.

Asynchronous parallel functional programming

mirai::mirai_map() schedules functional programming tasks without blocking the R session. The pattern is analogous to the purrr case.

controller <- crew_controller_local(profile = "default", seconds_idle = Inf)
controller$launch(n = 4)
tasks <- mirai_map(seq_len(4), \(x) Sys.sleep(10))
tasks
#> < mirai map [0/4] > # The tasks are still running.

Auto-scaling

crew can automatically scale workers in response to demand from mirai tasks. To enable this, we configure the controller differently:

controller <- crew_controller_local(
  profile = "default",
  seconds_idle = 30, # Workers will terminate after 30 seconds of idleness.
  workers = 4        # No more than 4 workers will run at one time.
)

The autoscale() method runs an asynchronous later loop that launches new workers in the background.

controller$autoscale()

later autoscaling not compatible with either of the functional programming sections above, but it can accommodate individual tasks.

task <- mirai(1 + 1)
# After waiting a few seconds:
task$data
#> [1] 2

To deactivate the auto-scaling loop:

controller$descale()

Caveats and limitations


  1. Or a compute profile you will supply to the .compute argument of mirai::mirai().↩︎

  2. If you didn’t set the “default” compute profile in the controller, you can set it in mirai() with .compute = controller$profile.↩︎

  3. These counts are the result of mirai::info(.compute = controller$profile). The “connections” and “cumulative” counts are for workers, and the rest are for tasks.↩︎