Improving coding efficiency in R

Hi all,

Quite new to R but making steady progress. Started about 2 or 3 years ago and feeling more comfortable all the time. However, I feel like there are areas where I can be more efficient and likely make code easier to read, share, and utilise. I'm looking for some general tips and things that individuals have found that makes life a bit easier. A couple of specific areas that would be awesome:

  1. Rerunning analysis across slightly different datasets. Two examples here. First, if I rerun the analysis (machine learning, multilevel modelling) on the same dataset with different variables my current workflow has been to open a new markdown file, go to the data cleaning stage process and add/remove the variables of interest. Second, running the same machine learning analysis across different subsets of the data. Following a similar pattern here, where it'd be copy and pasted and then changed to assess subset 1 or subset 2.
  2. Removing large numbers of variables. I work within some datasets that have large variables and wondering if anyone has any tips or strategies to fine tune these down. I do use options such as starts_with, ends_with, contains, but at some stage I will have a dplyr::select() with a large number of variables written. Not sure if there is a better way to do this!

Thank you in advance for any tips or advice. If anything needs to be clearer please let me know

The most general strategy for replacing bothersome copy/paste/slightly-change workflow, is to write a function and call it with the different parameters you want.

  1. its not clear what tricks you may not know, you didnt mention that selections can be done also based on variable types. with where(yourpredicatehere), and there is also an alternative to select by writing out rules for what to bring in which is using select and writing rules what to leave out, the minus symbol and also exclamation mark symbol can be used for that. but I suppose mostly the answer will depend on your datasets and what you are doing. one idea might be if there are different sets of variables that serve different roles, you can group them and reference them in a grouped way.
myvarsfor_x <- c("x1","x2")
myvarsfor_y <- c("ya","yb","yc")
select(mydata,all_of(c(myvars_for_x,
                       myvarsfor_y)))

@nirgrahamuk

This is quite helpful, thank you! I think that is one thing I can get better at...do you have any resources or tips to learn how to effectively write functions?

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.