A question has been bugging me. I know R has lazy evaluation, which means that it doesn't do calculations until it needs to, and statements create promises rather than do the actual work.
I want to understand how this works with dplyr. Are there tricks to using dplyr that prevent copies of the dataframe being made, if all we are really doing is selecting rows and columns? I have't been able to find a guide to this. I made this short reprex that identifies instances where dplyr did or did not make a copy of the data. Does anyone know of a guide?
The context is I am trying to streamline a shiny app which has a sizeable dataframe that I am filtering and sorting.
library(pryr) library(dplyr) # data frame with a million doubles df <- data.frame(x=runif(500000), y=runif(500000)) object_size(df) #> 8 MB # rename doesn't make a copy df2 <- df %>% rename(w=x) object_size(df, df2) #> 8 MB # filter does make a copy df2 <- df %>% filter(x>0.5) object_size(df, df2) #> 12 MB # arrange does make a copy df2 <- df %>% arrange(x) object_size(df, df2) #> 16 MB # select doesn't make a copy df2 <- df %>% select(y) object_size(df, df2) #> 8 MB
Created on 2019-07-06 by the reprex package (v0.3.0)