Extracting columns from many datasets into one main data frame. Implement the operation in a FOR-loop

trymkok · April 21, 2020, 8:53am

Hi, all.

I would like to make seperate datasets for the years 2015, 2016 and 2017. The way I did it for one year is like the following:

Merging datasets:

df_2017<- data.frame(cars$2017, bikes$2017, motorcycle$2017, trucks$2017, cars_p$2017,
bikes_p$2017, motorcycle_p$2017, truck_p$2017)

cars, bikes, motorcycle etc are other small dataframes that I would like to extract information for only one particular year and gather the information from all the different datasets into one. So the gathered data.frame for 2017 would be df_2017. The name of the columns are the years (2015, 2016, 2017)

Does anyone know how I can implement this in a FOR-loop (or by vectorization) so I dont have to write the code over and over again?

Thanks!

siddharthprabhu · April 21, 2020, 9:47am

You could use purrr:map_dfc() to achieve this.

Here I'm illustrating on 3 toy data frames a, b and c. Note that naming each element of df_list before calling map_dfc() ensures that each output column's name matches the tibble from where that column was extracted.

library(tidyverse)

a <- tibble(`2017` = seq(1, 10), `2018` = seq(11, 20))
b <- tibble(`2017` = seq(21, 30), `2018` = seq(31, 40))
c <- tibble(`2017` = seq(41, 50), `2018` = seq(51, 60))

df_list <- list(a = a, b = b, c = c)

map_dfc(df_list, ~ .x$`2017`)
#> # A tibble: 10 x 3
#>        a     b     c
#>    <int> <int> <int>
#>  1     1    21    41
#>  2     2    22    42
#>  3     3    23    43
#>  4     4    24    44
#>  5     5    25    45
#>  6     6    26    46
#>  7     7    27    47
#>  8     8    28    48
#>  9     9    29    49
#> 10    10    30    50

^{Created on 2020-04-21 by the reprex package (v0.3.0)}

trymkok · April 21, 2020, 11:39am

Thanks a lot! I will try it out!

trymkok · April 21, 2020, 11:50am

Do you know how I can loop this sequence so that I make a tibble for each year?

I tried:

for (i in 2013:2017) {
df_i <- map_dfc(df_list, ~ .x$i)
}

But it didn't work.

Thanks

Yarnabrina · April 21, 2020, 12:08pm

Continuing the same example provided by Siddharth, you can do something like this:

library(purrr)
library(tibble)

a <- tibble(`2017` = 1:10, `2018` = 11:20)
b <- tibble(`2017` = 21:30, `2018` = 31:40)
c <- tibble(`2017` = 41:50, `2018` = 51:60)

df_list <- list(a = a, b = b, c = c)
year_list <- c(2017, 2018)

walk(.x = year_list,
     .f = function(year)
     {
         assign(x = paste("df", year,
                          sep = "_"),
                value = map_dfc(.x = df_list,
                                .f = ~ .x[[as.character(x = year)]]),
                envir = .GlobalEnv)
     })

df_2017
#> # A tibble: 10 x 3
#>        a     b     c
#>    <int> <int> <int>
#>  1     1    21    41
#>  2     2    22    42
#>  3     3    23    43
#>  4     4    24    44
#>  5     5    25    45
#>  6     6    26    46
#>  7     7    27    47
#>  8     8    28    48
#>  9     9    29    49
#> 10    10    30    50

df_2018
#> # A tibble: 10 x 3
#>        a     b     c
#>    <int> <int> <int>
#>  1    11    31    51
#>  2    12    32    52
#>  3    13    33    53
#>  4    14    34    54
#>  5    15    35    55
#>  6    16    36    56
#>  7    17    37    57
#>  8    18    38    58
#>  9    19    39    59
#> 10    20    40    60

There are two reasons for which your code fails.

First one is that df_i won't recognise the year, it'll simply use a variable df_i for all year, so data.frame for each year will replace the previous one. You don't want to do that.

Second, your code tries to access the column with name literally i, and not it's value. To use the value, `[[` is the better function than `$`. Also, you have to convert i to a character, as column names are character and not integer.

Hope this helps.

trymkok · April 21, 2020, 12:11pm

That worked perfectly! Thanks a lot!

trymkok · April 21, 2020, 12:28pm

Okey, thanks! Especially on the comment on converting i to a character. That was VERY useful.
After doing that and going from $ to [[, my loop went through, but as you said it only made one data.frame. Is there a way to store the previous years?

trymkok · April 21, 2020, 1:36pm

An what should I do if I want to change the names of the columns in an effective way?

system · April 28, 2020, 5:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.