loop (calculate according to several factors of the data frame)?

louna123 · November 11, 2019, 9:08am

Hello,
I'm new to Rstudio and I have a problem that I can't solve by looking at the forum :
I would like to make a loop in R, to be able to calculate the yield, according to several factors of my data frame (variety, years, seeding depth...). The loop I created works but the data that comes out of it is not interpretable (long list of data, we don't know what is what).

listyear<- unique(data$year)
print(listyear)
listtype<- unique(data$Type)
print(listtype)
listvariety<- unique(data$VarFH.PH)
print(listvariety
listseed<- unique(data$Appli1.2)
print(listseed)

for (yy in c(1:length(listyear))){
  for (xx in c(1:length(listvariety))) {
    for (zz in c(1:length(listseed))) {
      for(ww in c(1:length(listtype))) {
        mean_yeald_type1<-mean(data$yeald [which((data$Type==listtype[ww])&(data$VarFH.PH==listvariety[xx])&(data$Appli1.2==listseed[zz]))], na.rm=TRUE)  
        print(mean_yeald_type1)
      }
    }
  }
  tmp <- fp[which(data$year==listyear[yy]),]
}

Do you have any advice for me?

Thank you in advance for your help.
Louna

valeri · November 11, 2019, 9:29am

HI @louna123,

could you include a (subset) of your data data frame in a format that others can copy-paste into R Studio? See e.g. FAQ: How to do a minimal reproducible example ( reprex ) for beginners

In this way you will greatly increase your chances of getting a fast and meaningful response.

Once you do that seems like a much more simple solution is to use dplyr to group_by and then aggregate, some examples can be found here https://rdrr.io/cran/dplyr/man/group_by.html

StatSteph · November 11, 2019, 5:53pm

It looks like you want the mean of a variable for each level of year, type, variety, and seed. I would suggest this:

library(tidyverse)
meandata <- data %>%
  group_by(year, Type, VarFH.PH, Appli1.2) %>%
  summarise(mean_yeald_type1=mean(yeald, na.rm=TRUE))

This calculates the mean yeald for every combination of year, Type, VarFH.Ph, and Appli1.2 and you'll have a data.frame which will have 5 columns: year, Type, VarFH.Ph, Appli1.2 and mean_yeald_type1

woodward · November 11, 2019, 6:00pm

Do it the way @StatSteph suggested. In R we try to to avoid writing loops over the rows of dataframes, as they can be slow. Most R functions work on vectors so we take advantage of that. It's a bit different to programming in other languages. Typically you will look for a package that has functions to do what you want, in this case the dplyr package (in tidyverse). You can do it the way you suggested, but it's going to be slow and verbose.

StatSteph · November 11, 2019, 6:12pm

I want to follow-up on this. tidyverse makes code faster to write and makes it more readable. It does not always make computation faster - in fact it can make it slower. For smaller data, the time savings is in the time it takes one to WRITE the code not run it.

woodward · November 11, 2019, 6:29pm

@StatSteph, getting of the topic a bit, but I usually make the assumption that tidyverse is going to be fast (maybe not fastest) because (1) tidyverse is getting faster with new optimizations (2) needing something faster than tidyverse is a special case and then I would have to learn data.table (which I've been resisting), (3) when it's slow it's usually not tidyvese's fault but my own..

martin.R · November 11, 2019, 6:49pm

Whether speed matters or not is down to each scenario, but there are aspects of the tidyverse which are much slower than base, nevermind data.table. In particular, when there are a large number of groups, then group_by() operations can be very slow.

Incidentally, data.table's syntax is actually very simple and not more complex than the tidyverse - it's just more compact. Translating the above:

library(data.table)
meandata <- data[, .(mean_yeald_type1=mean(yeald, na.rm=TRUE), by = .(year, Type, VarFH.PH, Appli1.2)]

system · December 2, 2019, 6:49pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.