Matrix::crossprod() induces excessive garbage collection

I've used a combination of code profiling and embedded timestamping in a large application to identify the parts taking up an overwhelming amount of the execution time. It seems that a call to Matrix::crossprod() is inducing excessive garbage collection that is taking up to 90% of the application execution time.

This short reprex is partially illustrative; I can't find a way to replicate the separation of getting the result from the call to apply(), followed by the induced gc.

In my application, I see the response to toc() immediately, followed by the gc messages.

The matrix u is large; is the gc induced by crossprod() inevitable, or are there ways round it? R version 4.0 claims better gc, would this help significantly? Any suggestions would be appreciated.

library(Matrix)
library(tictoc)
gcinfo(verbose = TRUE)
u <- Matrix(rnorm(10e6), 1e6)
tic()
v <- apply(u, 1, crossprod)
toc()
> library(Matrix)
> library(tictoc)
> gcinfo(verbose = TRUE)
# [1] FALSE
> u <- Matrix(rnorm(10e6), 1e6)
> tic()
> v <- apply(u, 1, crossprod)

#### In my application the toc() call responds here,
#### followed by the gc messages below.

# Garbage collection 28 = 15+3+10 (level 0) ... 
# 253.9 Mbytes of cons cells used (68%)
# 1071.6 Mbytes of vectors used (79%)
# Garbage collection 29 = 16+3+10 (level 0) ... 
# 256.8 Mbytes of cons cells used (69%)
# 1071.9 Mbytes of vectors used (79%)
# Garbage collection 30 = 17+3+10 (level 0) ... 
# 259.7 Mbytes of cons cells used (70%)
# 1072.2 Mbytes of vectors used (79%)
# Garbage collection 31 = 18+3+10 (level 0) ... 
# 262.5 Mbytes of cons cells used (71%)
# 1072.5 Mbytes of vectors used (79%)
# Garbage collection 32 = 19+3+10 (level 0) ... 
# 265.2 Mbytes of cons cells used (71%)
# 1072.7 Mbytes of vectors used (79%)
# Garbage collection 33 = 20+3+10 (level 0) ... 
# 267.9 Mbytes of cons cells used (72%)
# 1073.0 Mbytes of vectors used (79%)
# Garbage collection 34 = 21+3+10 (level 0) ... 
# 270.5 Mbytes of cons cells used (73%)
# 1073.2 Mbytes of vectors used (79%)
# Garbage collection 35 = 22+3+10 (level 0) ... 
# 273.0 Mbytes of cons cells used (73%)
# 1073.5 Mbytes of vectors used (79%)
# Garbage collection 36 = 23+3+10 (level 0) ... 
# 275.5 Mbytes of cons cells used (74%)
# 1073.7 Mbytes of vectors used (79%)
# Garbage collection 37 = 24+3+10 (level 0) ... 
# 277.9 Mbytes of cons cells used (75%)
# 1073.9 Mbytes of vectors used (79%)
# Garbage collection 38 = 25+3+10 (level 0) ... 
# 280.2 Mbytes of cons cells used (75%)
# 1074.2 Mbytes of vectors used (79%)
# Garbage collection 39 = 26+3+10 (level 0) ... 
# 282.5 Mbytes of cons cells used (76%)
# 1074.4 Mbytes of vectors used (79%)
# Garbage collection 40 = 27+3+10 (level 0) ... 
# 284.7 Mbytes of cons cells used (77%)
# 1074.6 Mbytes of vectors used (79%)
# Garbage collection 41 = 28+3+10 (level 0) ... 
# 286.9 Mbytes of cons cells used (77%)
# 1074.8 Mbytes of vectors used (79%)
# Garbage collection 42 = 29+3+10 (level 0) ... 
# 289.0 Mbytes of cons cells used (78%)
# 1075.0 Mbytes of vectors used (79%)
# Garbage collection 43 = 30+3+10 (level 0) ... 
# 291.1 Mbytes of cons cells used (78%)
# 1075.2 Mbytes of vectors used (79%)
# Garbage collection 44 = 31+3+10 (level 0) ... 
# 293.1 Mbytes of cons cells used (79%)
# 1075.4 Mbytes of vectors used (79%)
# Garbage collection 45 = 32+3+10 (level 0) ... 
# 295.0 Mbytes of cons cells used (79%)
# 1075.6 Mbytes of vectors used (79%)
# Garbage collection 46 = 33+3+10 (level 0) ... 
# 296.9 Mbytes of cons cells used (80%)
# 1075.7 Mbytes of vectors used (79%)
# Garbage collection 47 = 34+3+10 (level 0) ... 
# 298.8 Mbytes of cons cells used (80%)
# 1075.9 Mbytes of vectors used (79%)
# Garbage collection 48 = 34+4+10 (level 1) ... 
# 300.6 Mbytes of cons cells used (81%)
# 1076.0 Mbytes of vectors used (79%)
# Garbage collection 49 = 34+4+11 (level 2) ... 
# 300.5 Mbytes of cons cells used (50%)
# 1075.0 Mbytes of vectors used (66%)
# Garbage collection 50 = 34+4+12 (level 2) ... 
# 308.0 Mbytes of cons cells used (51%)
# 1075.7 Mbytes of vectors used (66%)
# Garbage collection 51 = 35+4+12 (level 0) ... 
# 315.2 Mbytes of cons cells used (53%)
# 1076.4 Mbytes of vectors used (66%)
# Garbage collection 52 = 36+4+12 (level 0) ... 
# 322.3 Mbytes of cons cells used (54%)
# 1077.0 Mbytes of vectors used (66%)
# Garbage collection 53 = 37+4+12 (level 0) ... 
# 329.2 Mbytes of cons cells used (55%)
# 1077.7 Mbytes of vectors used (66%)
# Garbage collection 54 = 38+4+12 (level 0) ... 
# 335.9 Mbytes of cons cells used (56%)
# 1078.3 Mbytes of vectors used (66%)
# Garbage collection 55 = 39+4+12 (level 0) ... 
# 342.5 Mbytes of cons cells used (57%)
# 1079.0 Mbytes of vectors used (66%)
# Garbage collection 56 = 40+4+12 (level 0) ... 
# 348.9 Mbytes of cons cells used (58%)
# 1079.6 Mbytes of vectors used (66%)
# Garbage collection 57 = 41+4+12 (level 0) ... 
# 355.1 Mbytes of cons cells used (59%)
# 1080.2 Mbytes of vectors used (66%)
# Garbage collection 58 = 42+4+12 (level 0) ... 
# 361.2 Mbytes of cons cells used (60%)
# 1080.7 Mbytes of vectors used (67%)
# Garbage collection 59 = 43+4+12 (level 0) ... 
# 367.1 Mbytes of cons cells used (61%)
# 1081.3 Mbytes of vectors used (67%)
# Garbage collection 60 = 44+4+12 (level 0) ... 
# 372.9 Mbytes of cons cells used (62%)
# 1081.9 Mbytes of vectors used (67%)
# Garbage collection 61 = 45+4+12 (level 0) ... 
# 378.6 Mbytes of cons cells used (63%)
# 1082.4 Mbytes of vectors used (67%)
# Garbage collection 62 = 46+4+12 (level 0) ... 
# 384.1 Mbytes of cons cells used (64%)
# 1082.9 Mbytes of vectors used (67%)
# Garbage collection 63 = 47+4+12 (level 0) ... 
# 389.4 Mbytes of cons cells used (65%)
# 1083.4 Mbytes of vectors used (67%)
# Garbage collection 64 = 48+4+12 (level 0) ... 
# 394.6 Mbytes of cons cells used (66%)
# 1083.9 Mbytes of vectors used (67%)
# Garbage collection 65 = 49+4+12 (level 0) ... 
# 399.7 Mbytes of cons cells used (67%)
# 1084.4 Mbytes of vectors used (67%)
# Garbage collection 66 = 50+4+12 (level 0) ... 
# 404.7 Mbytes of cons cells used (68%)
# 1084.9 Mbytes of vectors used (67%)
# Garbage collection 67 = 51+4+12 (level 0) ... 
# 409.5 Mbytes of cons cells used (68%)
# 1085.3 Mbytes of vectors used (67%)
> toc()
# 13.045 sec elapsed
> 

A few things come to mind that do not provide an explanation but may bear:

  1. Lagging R by a major release may not only be leaving cycles on the table, but packages may rot with time
  2. Introspection of the gc process might, itself, slow down the running time
  3. Although gc may be doing its job, side effects may still arise from the operating system failing to free up resources.
1 Like

What's the etiquette on cross-posting? When would it be acceptable to repost this (or similar) to https://r-forge.r-project.org/tracker/?group_id=61?

forum guidance on crossposting can be read here

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.