Identify dataframe inside a for loop

I have this chunk of code

# Select the columns
WD_selected <- WD %>%
  select(Div, Date, Time, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR, HS, AS, HST, AST, HF, AF, HC, AC, HY, AY, HR, AR)

which aplies to one data frame.

I want to do the same in 30 dataframes.

In the following chunk of code how can I identify a dataframe inside the curly brackets?

What shall I right instead of i_selected <- i

Nome_Base <- "I1"
Numero_Ficheiros <- 30
Numero_Ficheiros <- 1:Numero_Ficheiros
Lista_Data_Frames <- paste0(Nome_Base, "_", Numero_Ficheiros)

for (i in Lista_Data_Frames) {
  i_selected <- i %>%
    select(Div, Date, Time, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR, HS, AS, HST, AST, HF, AF, HC, AC, HY, AY, HR, AR)
  
}

Hello,

This depends a bit on where your data frames come from. Do you have 30 different files your are loading in? Or are you creating 30 dataframes in previous code (which is likely less optimal). Also, what do you want the output to be: 30 modified dataframes, 30 new ones with the old ones kept, or one dataframe that combines everything?

Here is one example:

library(tidyverse)

#Data frames you want to process
df1 = data.frame(x =runif(5), y = 1:5, z = LETTERS[1:5])
df2 = data.frame(x =runif(5), y = 1:5, z = LETTERS[11:15])

#Group them in a list
allDF = list(df1, df2)
allDF
#> [[1]]
#>           x y z
#> 1 0.7719568 1 A
#> 2 0.7730725 2 B
#> 3 0.2268333 3 C
#> 4 0.4621981 4 D
#> 5 0.9618335 5 E
#> 
#> [[2]]
#>             x y z
#> 1 0.008839024 1 K
#> 2 0.776059908 2 L
#> 3 0.808872494 3 M
#> 4 0.131816422 4 N
#> 5 0.766864560 5 O

#Iterate over the list, returning the modified dataframe
newDF = lapply(allDF, function(df){
  df %>% select(x, z)
})
newDF
#> [[1]]
#>           x z
#> 1 0.7719568 A
#> 2 0.7730725 B
#> 3 0.2268333 C
#> 4 0.4621981 D
#> 5 0.9618335 E
#> 
#> [[2]]
#>             x z
#> 1 0.008839024 K
#> 2 0.776059908 L
#> 3 0.808872494 M
#> 4 0.131816422 N
#> 5 0.766864560 O

Created on 2023-02-16 with reprex v2.0.2

Hope this helps,
PJ

@pieterjanvc

I adapted the code you gave as shown:

newDF = lapply(Lista_Data_Frames, function(df){
  df %>% select(Div, Date, Time, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR, HS, AS, HST, AST, HF, AF, HC, AC, HY, AY, HR, AR)
})

and I receivd this error message:

Error in UseMethod("select") : 
no applicable method for 'select' applied to an object of class "character

Can you help me?

The example you were shown demonstrated how to directly deal with data.frames that are in a list. It seems that you dont have that, what you have in a list is a series of names of frames. To go from a name to a thing itself you can use get() or mget(), use it in the function to get the thing referred to by the name being processed and select on what the get() returns.

1 Like

thanks @nirgrahamuk for that clarification, I didn't even know about the get() function myself, that's cool :slight_smile:

So @ramgouveia to update the example with this new piece of info:

library(tidyverse)

#Data frames you want to process
df1 = data.frame(x =runif(5), y = 1:5, z = LETTERS[1:5])
df2 = data.frame(x =runif(5), y = 1:5, z = LETTERS[11:15])

#Group them in a list
allDF = c("df1", "df2")

#Iterate over the list, returning the modified dataframe
newDF = lapply(allDF, function(df){
  get(df) %>% select(x, z)
})
newDF
#> [[1]]
#>           x z
#> 1 0.5724255 A
#> 2 0.7609488 B
#> 3 0.8762743 C
#> 4 0.3370264 D
#> 5 0.3034398 E
#> 
#> [[2]]
#>           x z
#> 1 0.5020190 K
#> 2 0.2558775 L
#> 3 0.8546706 M
#> 4 0.3392919 N
#> 5 0.5135671 O

Created on 2023-02-17 with reprex v2.0.2

Hope this helps,
PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.