Looping through the levels of categorical variable for ggplot

Hi there,

I am trying to plot the growth curves of different bacterial lines and create a separate plot for each line. I couldn't figure out how I would design the for loop to iterate through the different levels of the variable "line" in the dataframe Growthdata. I tried it with something like that:

for (i in levels(Growthdata$line)){
split = Growthdata[Growthdata$line == i,]
ggplot(data=split, aes(x=Time, y= OD600_Bl_sub))+
  geom_point(aes(color=rep))+
  labs(color='Replicate')+
  ggtitle(paset0("Line N", i))
ggsave(paste0("Line H", i, ".tiff"))
}

But it seems like the looping variable cannot be used like that (within squared brackets) in R.

I can give you a glimpse on the dataframe of the Growthdata:

structure(list(Time = c(0, 0.0833333333333333, 0.166666666666667, 
0.25, 0.333333333333333, 0.416666666666667, 0.5, 0.583333333333333, 
0.666666666666667, 0.75, 0.833333333333333, 0.916694444444444, 
1, 1.08333333333333, 1.16669444444444, 1.25002777777778, 1.33336111111111, 
1.41669444444444, 1.50002777777778, 1.58336111111111, 1.66669444444444, 
1.75002777777778, 1.83336111111111, 1.91669444444444, 2.00002777777778
), Trt = c("PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1", 
"PR_15_RG_N1_wiprey_hi_1", "PR_15_RG_N1_wiprey_hi_1"), OD600 = c(0.208100005984306, 
0.203700006008148, 0.202199995517731, 0.202099993824959, 0.202299997210503, 
0.20270000398159, 0.202800005674362, 0.203400000929832, 0.203999996185303, 
0.204999998211861, 0.205599993467331, 0.206699997186661, 0.206799998879433, 
0.206900000572205, 0.207800000905991, 0.209999993443489, 0.210600003600121, 
0.211899995803833, 0.211300000548363, 0.212200000882149, 0.211999997496605, 
0.212599992752075, 0.212400004267693, 0.213599994778633, 0.213100001215935
), OD600_Bl_sub = c(0.125700004398822, 0.121200005213419, 0.119633329411348, 
0.119466662406922, 0.119599997997284, 0.120000004768371, 0.120133340358734, 
0.120633333921432, 0.121366662283739, 0.12219999730587, 0.12279999256134, 
0.123866664866607, 0.123999997973442, 0.124099999666214, 0.125066667795182, 
0.127233328918616, 0.127700003484885, 0.129099994897842, 0.128433331847191, 
0.129333334664504, 0.129199996590614, 0.129766657948494, 0.129500004152457, 
0.130766659975052, 0.130300000309944), cycle = c("15", "15", 
"15", "15", "15", "15", "15", "15", "15", "15", "15", "15", "15", 
"15", "15", "15", "15", "15", "15", "15", "15", "15", "15", "15", 
"15"), phase = c("RG", "RG", "RG", "RG", "RG", "RG", "RG", "RG", 
"RG", "RG", "RG", "RG", "RG", "RG", "RG", "RG", "RG", "RG", "RG", 
"RG", "RG", "RG", "RG", "RG", "RG"), line = structure(c(2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AN", "N1", "N2", "N3", 
"N4", "N5", "N6", "N7", "N8"), class = "factor"), prey = c("wiprey", 
"wiprey", "wiprey", "wiprey", "wiprey", "wiprey", "wiprey", "wiprey", 
"wiprey", "wiprey", "wiprey", "wiprey", "wiprey", "wiprey", "wiprey", 
"wiprey", "wiprey", "wiprey", "wiprey", "wiprey", "wiprey", "wiprey", 
"wiprey", "wiprey", "wiprey"), pred = c("hi", "hi", "hi", "hi", 
"hi", "hi", "hi", "hi", "hi", "hi", "hi", "hi", "hi", "hi", "hi", 
"hi", "hi", "hi", "hi", "hi", "hi", "hi", "hi", "hi", "hi"), 
    rep = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1")), row.names = c(NA, 25L), class = "data.frame")

The typical ggplot solution to this is using facet_*. However, in your reprex, all of the factors are the same - If I change your data a bit so that Growthdata$line looks like this:

structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    .Label = c("AN", "N1", "N2", "N3", "N4", "N5", "N6", "N7", "N8"), class = "factor")

Then faceting would give you a plot like this:

GrowthData %>% 
    ggplot(aes(x = Time, y = OD600_Bl_sub)) + 
    geom_line() +
    facet_wrap(vars(line))

Is there a reason that you need to be doing what you are doing in a loop, as opposed to like this?

Ad @dvetsch75 says, your sample data is not big enough to work with so I,m going to give you an example with the iris built-in data set. You can use nested data frames and purrr functions to iterate over subsets of your data. This code saves individual plots for each level on a categorical variable.

library(tidyverse)

iris %>% 
    group_nest(Species) %>% 
    mutate(plot = map2(.x = data,
                       .y = Species,
                       .f = ~{
                           ggplot(.x, aes(x = Sepal.Length, y = Sepal.Width)) +
                               geom_point() + 
                               labs(title = paste("Species:", .y))
                       })) %>% 
    walk2(.x = .$plot,
         .y = .$Species, 
         .f = ~ ggsave(paste0("Species_", .y, ".tiff"), plot = .x)
    )

Thanks for staying with me although you couldn't really use the data. I didn't know how to include enough datapoints from the dataframe without sending you a huge wall of code. But I just saw that the code is being folded anyways when posted so I could have done so nonetheless.

I applied your method and it worked exactly as I wanted it to do, thank you so much! However map2(), walk2() and the .variables are something completely new to me and I don't get why you would need to code it the way you did. Can you recommend any resources to learn this? And is it then not possible to do it in a reasonable way using a for loop like in classic programming?

Thank you! This is also a beautiful solution I can use.

This chapter from R4DS is a nice explanation about loops and map functions in R

http://r4ds.had.co.nz/iteration.html

It is completely possible but the general concensus is that loops are rarely the preferred choice when programming with R, it is kind of non idiomatic.

1 Like