pipes and memory usage

Are there memory usage benefits or penalties when using pipes (%>%)? If you have a very large matrix or data frame, is it better to process it one step at a time or are pipes okay? I am concerned that the entire chain is stored in memory.

Yes, using pipe (just like using any function, really) comes with a tradeoff in terms of memory/CPU usage. Since you are adding something, then it'll use more resources.

However, whether this is critical in your application is largely a philosophical question that only you can answer with some hard data. Try using both things with as close to "real" life as possible and see the differences using, for example, microbenchmark or bench packages. You can also use code profiler in RStudio for more visual approach with graphs and stuff.

Just as a rule of thumb though, while pipes do add a bit of overhead in terms of computations, they also reduce mental overhead (obviously, IMHO) of making your intentions much clearer.

To your specific question about entire pipes being stored in memory - I'm not sure I understand what you are saying, but it seems to me that answer to your question is that no, pipes don't store anything in memory implicitly.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

This is exactly how it'll happen in a long chain as well. However, keep in mind that parts of the data frame (if you are working with a data frame) will be copied. If that's not what you want (e.g., you have too little RAM and your data is too big), you might consider using something like data.table. It won't magically solve all your problems, but since it modifies data in place, it needs less RAM overall.

1 Like

What I meant by that if I have a long chain, will all the steps reside in memory? For comparison, if I execute each step separately, each one will overwrite the previous if I keep using the same variable.