Create a for loop to make multiple data frames?

cwright1 · June 3, 2021, 2:08pm

I'm having trouble making a loop that will iterate through my data and create multiple data frames.

Here's some dummy data:

mydf <- data.frame("color"=c("blue","yellow","red","green","pink","orange","cyan"),
                   "height"=c(1,2,3,4,5,6,7),
                   "boy_1"=c(5,1,6,5,5,1,4),
                   "boy_2"=c(2,2,2,2,2,2,2),
                   "boy_3"=c(3,3,3,3,3,3,3),
                   "girl_1"=c(3,3,3,4,4,4,4),
                   "girl_2"=c(6,6,6,6,6,6,6))

mydf
   color height boy_1 boy_2 boy_3 girl_1 girl_2
1   blue      1     5     2     3      3      6
2 yellow      2     1     2     3      3      6
3    red      3     6     2     3      3      6
4  green      4     5     2     3      4      6
5   pink      5     5     2     3      4      6
6 orange      6     1     2     3      4      6
7   cyan      7     4     2     3      4      6

I want to say, "for every column after 'color' and 'height', make a new data frame - keep the 'color' and 'height columns. So in this case, I should return 5 data frames (preferably in a list).

mydf_1 = color, height, boy_1
mydf_2 = color, height, boy_2
...
mydf_5 = color, height, girl_2

I'm new to writing loops like these...This is my attempt (does not work!)

for(i in  mydf[,3:ncol(mydf)]{
mylist <- list()

mydf <- i

mylist[[i]] <- mydf #save dataframes to the list?


}

EDIT: Okay I spent some more time and think I got it!

mydf <- data.frame("color"=c("blue","yellow","red","green","pink","orange","cyan"),
                   "height"=c(1,2,3,4,5,6,7),
                   "boy_1"=c(5,1,6,5,5,1,4),
                   "boy_2"=c(2,2,2,2,2,2,2),
                   "boy_3"=c(3,3,3,3,3,3,3),
                   "girl_1"=c(3,3,3,4,4,4,4),
                   "girl_2"=c(6,6,6,6,6,6,6))

mylist <- list()
for(i in  seq_along(mydf[,3:ncol(mydf)])){

  mylist[[i]] <- mydf[,c(1,2,(i+2))]

}

This seems to return a list with the correct 5 data frames. It took me a while to arrive at the 'i+2' trick.

martinjhnhadley · June 3, 2021, 3:01pm

cwright1:

mydf <- data.frame("color"=c("blue","yellow","red","green","pink","orange","cyan"),
                   "height"=c(1,2,3,4,5,6,7),
                   "boy_1"=c(5,1,6,5,5,1,4),
                   "boy_2"=c(2,2,2,2,2,2,2),
                   "boy_3"=c(3,3,3,3,3,3,3),
                   "girl_1"=c(3,3,3,4,4,4,4),
                   "girl_2"=c(6,6,6,6,6,6,6))

If you're happy with a {tidyverse} solution then here's how I would go

Use pivot_longer() to pivot the data into longer format
group_split() let's you split a tibble into a list of tibbles by a grouping column, but it gives a gnarly object so I use as.list() to turn that into a normal list of tibbles
I then map the pivot_wider function over this list to give clean tibbles

library("tidyverse")

mydf <- data.frame("color"=c("blue","yellow","red","green","pink","orange","cyan"),
                   "height"=c(1,2,3,4,5,6,7),
                   "boy_1"=c(5,1,6,5,5,1,4),
                   "boy_2"=c(2,2,2,2,2,2,2),
                   "boy_3"=c(3,3,3,3,3,3,3),
                   "girl_1"=c(3,3,3,4,4,4,4),
                   "girl_2"=c(6,6,6,6,6,6,6))

mydf_split <- mydf %>% 
  pivot_longer(boy_1:girl_2) %>% 
  group_split(name) %>% 
  as.list() %>% 
  map(pivot_wider)

mydf_split
#> [[1]]
#> # A tibble: 7 x 3
#>   color  height boy_1
#>   <chr>   <dbl> <dbl>
#> 1 blue        1     5
#> 2 yellow      2     1
#> 3 red         3     6
#> 4 green       4     5
#> 5 pink        5     5
#> 6 orange      6     1
#> 7 cyan        7     4
#> 
#> [[2]]
#> # A tibble: 7 x 3
#>   color  height boy_2
#>   <chr>   <dbl> <dbl>
#> 1 blue        1     2
#> 2 yellow      2     2
#> 3 red         3     2
#> 4 green       4     2
#> 5 pink        5     2
#> 6 orange      6     2
#> 7 cyan        7     2
#> 
#> [[3]]
#> # A tibble: 7 x 3
#>   color  height boy_3
#>   <chr>   <dbl> <dbl>
#> 1 blue        1     3
#> 2 yellow      2     3
#> 3 red         3     3
#> 4 green       4     3
#> 5 pink        5     3
#> 6 orange      6     3
#> 7 cyan        7     3
#> 
#> [[4]]
#> # A tibble: 7 x 3
#>   color  height girl_1
#>   <chr>   <dbl>  <dbl>
#> 1 blue        1      3
#> 2 yellow      2      3
#> 3 red         3      3
#> 4 green       4      4
#> 5 pink        5      4
#> 6 orange      6      4
#> 7 cyan        7      4
#> 
#> [[5]]
#> # A tibble: 7 x 3
#>   color  height girl_2
#>   <chr>   <dbl>  <dbl>
#> 1 blue        1      6
#> 2 yellow      2      6
#> 3 red         3      6
#> 4 green       4      6
#> 5 pink        5      6
#> 6 orange      6      6
#> 7 cyan        7      6

Instead of creating MANY variables, how about naming the list of tibbles? We can extract the canonical name for each dataframe as follows:

map_chr(mydf_split, ~names(.x[3]))

And we can add these names to the list as follows:

names(mydf_split) <- map_chr(mydf_split, ~names(.x[3]))

Now we can extract girl_1 as follows:

mydf_split$girl_1
#> # A tibble: 7 x 3
#>   color  height girl_1
#>   <chr>   <dbl>  <dbl>
#> 1 blue        1      3
#> 2 yellow      2      3
#> 3 red         3      3
#> 4 green       4      4
#> 5 pink        5      4
#> 6 orange      6      4
#> 7 cyan        7      4

system · June 24, 2021, 3:02pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.