Skip to contents

a5R can parallelise vectorised operations using multiple threads via rayon. By default a5R uses a single thread, so there is zero overhead. You opt in to parallelism when you need it.

Setting the thread count

# Check the current setting (default: 1)
a5_get_threads()
#> [1] 1

# Use 4 threads
a5_set_threads(4)
a5_get_threads()
#> [1] 4

You can also set threads at package load time via an R option or environment variable - useful for scripts and batch jobs:

# In .Rprofile or at the top of a script
options(a5R.threads = 4)

# Or as an environment variable
# Sys.setenv(A5R_NUM_THREADS = 4) 

a5_set_threads() invisibly returns the previous value, making temporary changes easy:

old <- a5_set_threads(4)
# ... parallel work ...
a5_set_threads(old)

What gets parallelised

Threading applies to vectorised functions that process each element independently:

Function Per-element cost Benefit
a5_cell_to_boundary() Heavy (boundary + WKT/WKB) High
a5_grid() Heavy (boundary filtering) High
a5_lonlat_to_cell() Moderate (projection) High
a5_cell_distance() Moderate (2x projection + distance) Medium
a5_cell_to_lonlat() Moderate (reverse projection) Medium
a5_cell_to_parent() Light (bit ops + hex) Low
a5_get_resolution() Light (bit ops) Low
a5_is_cell() Light (hex parse) Low

Scalar and bulk operations (a5_cell_to_children(), a5_compact(), a5_cell_area(), etc.) are unaffected — they are already fast or delegate to algorithms that don’t parallelise element-wise.

When is it worthwhile?

Threading has a small fixed overhead (thread synchronisation, memory allocation for intermediate results). For small vectors this can outweigh the benefit. As a rule of thumb:

  • < 1,000 elements: stick with 1 thread
  • 1,000–10,000: 2-4 threads helps for heavy ops (boundary, indexing)
  • > 10,000: use as many threads as you have cores

Here’s a quick comparison on 100k cells:

cells <- a5_grid(c(-10, 50, 10, 60), resolution = 12)
length(cells)
#> [1] 704259

a5_set_threads(1)
system.time(a5_cell_to_boundary(cells, format = "wkt"))
#>   user  system elapsed
#>  3.124   0.000   3.122 

a5_set_threads(8)
system.time(a5_cell_to_boundary(cells, format = "wkt"))
#>   user  system elapsed
#> 6.195   1.289   1.667 

Note that user time increases (total CPU work across all threads) while elapsed (wall-clock) time decreases — that’s the parallelism at work.

Thread safety

a5R uses a dedicated rayon thread pool, separate from R’s own parallelism. It is safe to use alongside future, mirai, etc. but think carefully about this nested parallelism as it can, if overloaded, degrade performance.

The thread pool is rebuilt each time you call a5_set_threads(), so changing the count mid-session is fine (and cheap) but not free - ideally, just do it once at the start of your workflow rather than toggling per-call.