How can I get help to learn to shorten my code?

I know this might sound odd but I wrote a much code and I am just not happy about it (even it works!) because I think there might be ways to reduce the mere length of it.

In order to develop shorter code the next time, how shall I go about to get help making my code more efficient.
I am sorry, I did not provide a minimal example :slight_smile:

1 Like

Welcome. Could you tell us some of the coding practices that you think might be unnecessarily lengthening your code? Do your programs run more slowly than you expect they should? How large is your data (rows x columns)? What kind of processing are you doing?

As a general rule, code should communicate to its user what it does and how it does it. Code that is too abbreviated may fail at that without providing any noticeable gains in performance. Because R is an interpreted language, unlike C++ code optimization for speed through brevity doesn't yield the same benefits.

1 Like

Hi,

Here are a few general tips that might help (depending on how experienced you are you might already know this)

  • Creating loops: If you have lines of code that are repetitive (but with different input) and are all one after another, you should write a loop to do this code over and over again, each time with different input.
  • Writing functions: If you have pieces of code that are repetitive, but are executed in different parts of the script (so other code in between) you can write a function to generalize the repetitive parts and execute them whenever you want
  • Using tidyverse: The R-tidyverse takes a bit getting used to, but this way of writing code not only increases readability once you understand its structure, it also ensures optimised execution of the code and can often reduce the number of lines to write by quite a bit.
  • There's probably much more, but this is what I can think of on the top of my head ...

I did not put links in here, because if you type any of these keywords in Google or YouTube you'll get plenty of great tutorials to choose from.

Finally, if you do have a particular problem, just reach out to us on here and we'll see if we can optimize your code (providing you have a minimal example of course)

Good luck,
PJ

4 Likes

Hi @technocrat
So I think in my code, I am performing very similar procedures over and over again. I tried to put parts in function, but it is still lengthy.
Especially I do the following a lot:
modeloutput1 <- model1 ...
rearrange variables in modeloutput1
plot modeloutput1
save modeloutput1

modeloutput2 <- model2...
rearrange variables in modeloutput2
plot modeloutput2
save modeloutput2

and now I already wrote modeloutput 8 times. My script is around 20* this length... and obviously does stuff :frowning:

It is mostly about the time I spend writing (and at the same time thinking: can't this be done any faster???). I think it is also now not really good in communicating to others, because its sole purpose is to create the plots in away I like them to have.

Thank you!

1 Like

Hexy @pieterjanvc
Thanks to you, too.

I already use some function, yes! But no loops - except if else "loops" if that is a loop?
I have started to use data.table but not yet tidy verse, I will try that. Edit: I already use some packages from tidyvere such as magrittr (with limited success), ggplot (medium success but it does the job) as well as dplyr and forcats

Please also look at my "tiniest" example above. It seems that R does not like to put names at different places but rather wants me to type names over and over again...

Thank you!

Thanks for the concrete example, and you're right performing the same operations on structurally identical objects, in your case models, can and should be shortened. This can be done without sacrificing readability.

For the sake of simplicity (and because I'm not sure of a good guess on how the rearrangement of variables work), let's look at a function. You recall from schooldays f(x) = y where f is the name of the function, x is its input or argument and y is its output or result. It works the same way in R.

The first thing (some people say the hardest!) is naming your function. To avoid naming conflict with R functions in your namespace, one way is to use something like my_plot as a name.

To create it you use the R function named function

my_plot <- function(...

give it an argument

my_plot <- function(x) ...

and write the operations to be performed

my_plot <- function(x) {
   ...
}

That permits

my_plot(model1)
my_plot(model2)
my_plot(model3)
...
my_plot(model20)

This is only an outline of the approach, because I didn't want to make too many assumptions about your model objects.

When you've gained some more familiarity with tidyverse (starting with dplyr and ggplot), take a look a modelr for streamlining the rearrangement of variables.

1 Like

It's hard to give you concrete advise without context but if your models are applied over subsets of a larger dataframe or with different parameters for the same data, then you could shorten your code a lot by using purrr::map_ family of functions, if you could put together a more complete example we could give you much better help.

I have a couple of examples of "automating" repetitive tasks in R using functions and purrr::map() loops on my blog. Based on some of your other comments in this thread, you may find these useful as a starting point for how one person thought about and tackled these problems. :wink:

One is about automating model fitting when the response variables vary but the rest of the model structure is the same:

The other is about making exploratory plots (with ggplot2), although this could be extended to making nicer plots, as well :slightly_smiling_face::

3 Likes

Well I run models on different subsets of data, yes. However I always have to use the models again, right?

thanks technocrat

I do use functions already.

However, I mean I don't want the same title on the plots.

I give you a somewhat better reprex (but for now without any data), so you won't see the graphs, ok?

model1data <- data.frame(mm(depvar, choice ~ somesthing +  somesthingelse ,
                   id = ~uid, design=conjoint_design, baselines = baselines_base))


checkfunction(model1data)
#ENTER DF here:
model1data<- setDT(model1data)
is.data.table(model1data)
model1data<- preparefunction(model1data)
#my preparefunction sets the values underneath
if(is.combined==FALSE & is.somesthingelse==TRUE & is.eb==TRUE & is.ex == FALSE){s <- 16}
if(is.combined==FALSE & is.somesthingelse==TRUE & is.eb==FALSE & is.ex == FALSE){s <- 17}
if(is.combined==FALSE & is.somesthingelse==TRUE  & is.ex == TRUE){s <- 15}
if(is.combined==FALSE & is.somesthingelse==FALSE & is.ex == TRUE){s <- 0}
if(is.combined==FALSE & is.somesthingelse==FALSE & is.ea ==TRUE & is.ea == FALSE){s <- 1}
if(is.combined==FALSE & is.somesthingelse==FALSE & is.ea ==FALSE & is.ea == FALSE){s <- 2}


p_model1data      <- ggplot(model1data, aes(x = level, y = estimate, color=somesthingelse)) +
  geom_pointrange(aes(min = estimate - 1.95 * std.error, max = estimate + 1.95 * std.error), shape = s) +
  theme_bw() + 
  facet_wrap(something ~ ., scales = "free_y", nrow = 5, strip.position = "left") +
  coord_flip() +
  ylab("Always the same") +
  xlab("Always the same2") +
  #  ylim(-0.3,0.3)+
  geom_hline(yintercept = 0.5, lty="dashed") +
  ggtitle("Actually it would be awesome if title would change according to input i.e. is.ex and is.ea and obs") +
  if(is.ex==FALSE){scale_colour_manual(col="#4EC150", col="#330042")} # it actually would be helpful if colors could be linked to specific inputs
p_model1data


ggsave(filename="p_model1data.pdf", plot=last_plot())

# C ------------------------------------------------------------------
  subset <- tt
    setDT(subset)
subset <- subset[is.ea==1&is.ex==0]

  model3data <- data.frame(mm(df_conj, depvar, choice ~ somesthing +  somesthingelse
              id = ~uid, design=conjoint_design, baselines = baselines_base))

checkfunction(model2data)
#ENTER DF here:
model2data<- setDT(model2data)
is.data.table(model2data)
model2data<- preparefunction(model2data)

  #the names of my data frames are not like here, but convey a meaning!
  #i.e. NO number that just goes up 
  
  if(is.combined==FALSE & is.somesthingelse==TRUE & is.eb==TRUE & is.ex == FALSE){s <- 16}
if(is.combined==FALSE & is.somesthingelse==TRUE & is.eb==FALSE & is.ex == FALSE){s <- 17}
if(is.combined==FALSE & is.somesthingelse==TRUE  & is.ex == TRUE){s <- 15}
if(is.combined==FALSE & is.somesthingelse==FALSE & is.ex == TRUE){s <- 0}
if(is.combined==FALSE & is.somesthingelse==FALSE & is.ea ==TRUE & is.ea == FALSE){s <- 1}
if(is.combined==FALSE & is.somesthingelse==FALSE & is.ea ==FALSE & is.ea == FALSE){s <- 2}

 
  
p_model2data      <- ggplot(model2data, aes(x = level, y = estimate, color=somesthingelse)) +
  geom_pointrange(aes(min = estimate - 1.95 * std.error, max = estimate + 1.95 * std.error), shape = s) +
  theme_bw() + 
  facet_wrap(something ~ ., scales = "free_y", nrow = 5, strip.position = "left") +
  coord_flip() +
  ylab("Always the same") +
  xlab("Always the same2") +
  #  ylim(-0.3,0.3)+
  geom_hline(yintercept = 0.5, lty="dashed") +
  ggtitle("Actually it would be awesome if title would change according to input i.e. is.ex and is.ea and obs") +
  if(is.ex==FALSE){scale_colour_manual(col="#4EC150", col="#330042")} # it actually would be helpful if colors could be linked to specific inputs
p_model2data


ggsave(filename="p_model2data.pdf", plot=last_plot())


# Combine 1 & 2 ------------------------------------------------------------------

 model_c_data <- data.frame(bind_rows(model2data, model1data, .id = "idies"))

checkfunction(model_c_data)
model_c_data<- setDT(model_c_data)
is.data.table(model_c_data)
model_c_data<- preparefunction(model_c_data)

p_model_c_data      <- ggplot(model_c_data, aes(x = level, y = estimate, color=idies)) +
  geom_pointrange(aes(min = estimate - 1.95 * std.error, max = estimate + 1.95 * std.error, shape = idies)) +
  theme_bw() + 
  facet_wrap(something ~ ., scales = "free_y", nrow = 5, strip.position = "left") +
  coord_flip() +
  ylab("Marginal Means") +
  xlab("Attributes") +
  #  ylim(-0.3,0.3)+
 geom_hline(yintercept = 0.5, lty="dashed") +
  ggtitle("Would be awesome if this title knew it were a combined plot") 
  if(is.ea==TRUE & is.somesthingelse == TRUE){ p_model_c_data <- p_model_c_data + scale_shape_manual(values=c(17,16))} 
  if(is.ex==FALSE){p_model_c_data <- p_model_c_data + scale_colour_manual(values=c( "#330042","#4EC150"),aesthetics = "colour")} 
if(is.combined==TRUE & is.ex==FALSE){p_model_c_data <- p_model_c_data +  scale_shape_manual(values=c(2,1))}
geom_text(aes(label = is.ea), colour = "black", size = 2.5, hjust=1.05, vjust=1.2)
p_model_c_data

ggsave(filename="p_model_c_data.pdf", plot=last_plot())

Thanks, read through it. It is useful! Great, it will help me the next time ALOT!

3 Likes

Oh, of course you'd want individual titles, so you need a function with two arguments

my_plot <- function(x,y) {
   ...
}

my_plot(model1, "Actually it would be awesome if title would change according to input i.e. is.ex and is.ea and obs")

This doesn't capture all the model specific information you might want to vary, but illustrates that functions can take multiple arguments. And variation is not the same as repetition. :grinning:

1 Like

I'd suggest you to use lists.

I tried to shrink your code, including data preparation in the folder /data/, and utility functions in the folder /R/. The file /analyses.R reproduce your analyses (clearly, data are not stored, and functions don't work :slight_smile: ).

You can find everything here.

IMO "analyses.R" files should be used to produce analyses only. That makes the code much easier to be understood by humans (including me or collaborators, maybe in the future). I have found myself spending much more time to (re-re-re-)reading my code (to understand, correct, explain, ... it) than the time I spend writing it. I.e., I scroll much more than I type! :slight_smile: Hence, I try to put all my effort to spend more time writing code that allows me to spend less time to read and understand it in the future!

Moreover, splitting everything that "do something," the code gain many bonuses:

  1. you can replicate function calls without repeating all their code (e.g., using the purrr package)
  2. you can test the exact functionality of each function (e.g., using the testthat package)
  3. if you need to change something you have to change it in one place only (avoiding many checks and possible sources of bugs)

As the code and the analyses become more sophisticated (or repeated!), I think a package is a better environment for the work. Usually, I directly create a package for every analyses or project (if it is not a very little one both in space and time). Once you gain some practice, set up a package takes you no more than a couple of hours (often much less: usethis::create_package() create the basic skeleton in few seconds...). With a package, many tools for automating tests are provided. With a package, all the code is "packaged," so it is easier to share the whole project or to ask someone else to contribute. There are also fewer constraints for names of internal functions:

  • they cannot conflict with anything externally (if not exported)
  • external functions cannot conflict with the internal ones (function not internal nor in base must be called including the original package, e.g., dplyr::select()).

I can suggest How to write readable code by Dustin Boswell. It was incredible for me!

1 Like