# rray vs matrix - performance

I am currently exploring the use of `rray` as I find myself having to work with a matrix. I'm trying to better understand the vctrs paradigm with rray.

I've noticed some rather significant performance difference between a base matrix and an rray. Is there principle behavior of an rray that leads to this? Or is this a consequence of the type consistency?

``````# make word matrix
lyric <- matrix(rep(c("around", "the", "world"), 10), ncol = 10, nrow = 10)

# create matrix and rray to fill for self-sim
base_mat <- matrix(nrow = 10, ncol = 10)
rry <- rray::rray(NA, dim = c(10, 10))

# function for self_similarity matrix
loop <- function(mat) {
mat_size <- nrow(mat)
for (col in 1:mat_size) {
for(row in 1:mat_size) {
mat[row, col] <- (mat[row, col] <- lyric[row, col] == lyric[col,col])
}
}
mat
}

# time
tictoc::tic()
b_self_sim <- loop(base_mat)
tictoc::toc()
#> 0.027 sec elapsed

tictoc::tic()
rr_self_sim <- loop(rry)
tictoc::toc()
#> 0.063 sec elapsed
``````

Created on 2020-01-04 by the reprex package (v0.3.0)

Whew, okay, there are a few things at work here. Unfortunately both are inherent R limitations, at least until 4.0.0 is released with reference counting.

I'm fairly certain the speed difference is not really due to the implementation of rray's sub-assignment function, and has more to do with the fact that base R can take advantage of a trick that allows it to not have to copy `mat` every time an assignment is done with `<-`. Let's look at a simpler example:

``````library(rray)
library(profmem)

base_mat <- matrix(data = NA_real_, nrow = 10, ncol = 10)
rry <- rray(NA_real_, dim = c(10, 10))

fn <- function(x) {
for (i in 1:100) {
x[1, 1] <- 1
}
}

profmem_to_tbl <- function(x) {
out <- tibble::as_tibble(as.data.frame(x))
# remove new page allocs
out[!is.na(out\$bytes),]
}

profmem_to_tbl(profmem(fn(base_mat)))
#> # A tibble: 1 x 3
#>   what  bytes calls
#>   <chr> <dbl> <chr>
#> 1 alloc   848 fn()

profmem_to_tbl(profmem::profmem(fn(rry)))
#> # A tibble: 200 x 3
#>    what  bytes calls
#>    <chr> <dbl> <chr>
#>  1 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  2 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  3 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  4 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  5 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  6 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  7 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  8 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#>  9 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#> 10 alloc   848 fn() -> [<-() -> [<-.vctrs_rray() -> rray_subset_assign() -> rra…
#> # … with 190 more rows
``````

Created on 2020-01-04 by the reprex package (v0.3.0.9000)

Here R's `<-` only makes 1 copy of `x` over the loop of 100 assignments. rray on the other hand has to make 2 copies per loop, so 200 total. Those copies are what kill you. There are two reasons for this.

The first is that R's base matrices can use a trick. The first time that `x` has a `1` assigned to it, a copy is made. The next time it happens R recognizes that the fresh copy of `x` is not used anywhere else, so it does not make a copy and just reuses that memory. Unfortunately that feature is not available to package developers so rray has to make at least 1 copy on every iteration.

Side note: in R 4.0.0 there will be a "reference counting" feature that will allow package developers to keep track of the fact that `x` has not been "referenced" anywhere, and that it can be reused.

The other copy per iteration comes from the fact that, for whatever reason, `<-` forces a copy on you if you have an S3 method for it. For matrices it drops straight into a C implementation, but for rray objects it has to go through `[<-.vctrs_rray`, forcing the second copy. There isn't anything I can do about that either.

Hopefully that gets better in R 4.0.0 too, but I'm not sure.

1 Like

Thank you! This is extremely helpful.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.