Creating new dataframes inside a list from a single dataframe, using for loop

PaulaV · September 10, 2020, 10:18am

I have a dataframe with a character column and 4 numeric columns.
What I want to do is to apply filtering according to each column and create new data frames from each of them.

Here is an example data frame:

df <- data.frame(my_names=sample(LETTERS,4,replace=F),
                 column2=sample(1.3:100.3,4,replace=T),
                 column3=sample(1.3:100.3,4,replace=T),
                 column4=sample(1.3:100.3,4,replace=T),
                 column5=sample(1.3:100.3,4,replace=T))
> df
  my_names column2 column3 column4 column5
1        A     8.3     1.3    19.3    91.3
2        E    18.3    42.3     8.3    76.3
3        O     6.3    46.3    26.3    91.3
4

What I want to get, for one column (here column2) is this:

top <- 2
top_column2 <- df %>% arrange(desc(column2)) %>% 
  select(my_names, column2) %>% 
  top_n(top, column2) %>% 
  mutate(column2=round(column2))
> top_column2
  my_names column2
1        O      76
2        E      75

This works perfectly fine.

Now I would like to iterate
-the naming of the data frames and
-the creation of the data frames
for columns 2:5

I first tried doing:

> for(i in 2:ncol(df)) {
+   paste("top", colnames(df)[i], sep="_") <- 
+     df %>% arrange(desc(i)) %>% select(my_names, i) %>% top_n(top, i) %>% mutate(i=round(i))
+ }
Error in paste("top", colnames(df)[i], sep = "_") <- df %>% arrange(desc(i)) %>%  : 
  target of assignment expands to non-language object

After googling, tried assign(paste(), functions):

for(i in 2:ncol(df)) {
  assign(paste("top", colnames(df)[i], sep="_"), 
    df %>% arrange(desc(i)) %>% select(my_names, i) %>% top_n(top, i) %>% mutate(i=round(i))) 
}
top_column4
> top_column4
  my_names column4 i
1        A    19.3 4
2        E     8.3 4
3        O    26.3 4
4        M    59.3 4

This creates 4 data frames with the right names, but the functions are not applied and a new column is created.

More googling, trying now with an empty list to then change its elements:

test_list <- list()
for(i in 2:ncol(df)) {
  test_list[[paste("top", colnames(df)[i], sep="_")]] <- 
    df %>% arrange(desc(i)) %>% select(my_names, i) %>% top_n(top, i) %>% mutate(i=round(i))
}
> top_column4
  my_names column4 i
1        A    19.3 4
2        E     8.3 4
3        O    26.3 4
4        M    59.3 4

But here's the same problem as with assign(), the data frames are created with the right names, but not the right content.

Could someone please help to solve this issue?

pieterjanvc · September 10, 2020, 12:43pm

Hi,

Welcome to the RStudio community!

I'm not entirely sure if I understood what you were going for, but here is an example of my interpretation

library(dplyr) #Need version 1.0+

#Generate the data
set.seed(1)
df <- data.frame(my_names=sample(LETTERS,4,replace=F),
                 column2=sample(1.3:100.3,4,replace=T),
                 column3=sample(1.3:100.3,4,replace=T),
                 column4=sample(1.3:100.3,4,replace=T),
                 column5=sample(1.3:100.3,4,replace=T))

#Extract the data frames
top =  2
roundDecimals = 0

dfList = lapply(colnames(df)[-1], function(myCol){
  df %>% select(my_names, all_of(myCol)) %>%  
    arrange(across(all_of(myCol))) %>% slice(1:top) %>% 
    mutate(across(all_of(myCol), function(x) round(x, roundDecimals)))
})

dfList
#> [[1]]
#>   my_names column2
#> 1        A      14
#> 2        Y      34
#> 
#> [[2]]
#>   my_names column3
#> 1        G      51
#> 2        D      59
#> 
#> [[3]]
#>   my_names column4
#> 1        D      21
#> 2        G      54
#> 
#> [[4]]
#>   my_names column5
#> 1        Y       7
#> 2        D      73

^{Created on 2020-09-10 by the reprex package (v0.3.0)}

I used an lapply() function to iterate over all columns by name, then tried and implemented some of the new dplyr 1.0+ magic. I'm not very familiar with this yet, so there could be a more elegant solution, but in essence what I used was the new all_of() function to convert the name of a column into a dplyr variable of that same column. The across() functions allows you to perform a function across the selected columns.

I'm sorry for the weak explanation, but I had to fiddle with it myself as well to get it working, and am not sure yet I did everything the correct way, but it seems to produce results at least

Hope this helps,
PJ

nirgrahamuk · September 11, 2020, 10:04am

Here is an alternative,
I reuse more directly your original code for how to produce the table for column2, i functionised it and iterated it with purrr::map

library(tidyverse)
library(rlang)
df <- data.frame(my_names=sample(LETTERS,4,replace=F),
                      column2=sample(1.3:100.3,4,replace=T),
                      column3=sample(1.3:100.3,4,replace=T),
                      column4=sample(1.3:100.3,4,replace=T),
                      column5=sample(1.3:100.3,4,replace=T))
(cols <- setdiff(names(df),"my_names"))

results1<-purrr::map(.x=cols,
  ~{df %>% arrange(desc(!!sym(.x))) %>% 
  select(my_names, .x) %>% 
  top_n(2, !!sym(.x)) %>% 
  mutate(!!sym(.x):=round(!!sym(.x)))})

results1 %>% setNames(paste0("top_",cols))

# $top_column2
# my_names column2
# 1        H      79
# 2        K      36
# 
# $top_column3
# my_names column3
# 1        K      60
# 2        H      55
# 
# $top_column4
# my_names column4
# 1        P      76
# 2        K      58
# 
# $top_column5
# my_names column5
# 1        K      99
# 2        R      87

system · October 2, 2020, 10:04am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.