Loop through dataframe

Hi All,

I have a series of data frames USA, Canada, Mexico and such. How can I structure a loop in R so that no matter how many data frames we have, data cleaning steps can be applied to each data frame? For example, below step can be applied to USA, Canada and Mexico with loop.

USA <- df %>%
gather(key = "Year", value = "Volume", Jan:Dec)

Thanks for your help!

how are you loading these data frames into your environment.
Do you have a list of them ?

Thanks @nirgrahamuk!
I didn't have a list of them earlier. But now I have created as follows. I am very new to looping and such in R. Any help will be appreciated. Thank you!

dfList <- list(USA, Canada, Mexico)

the general solution to this is of the form

library(purrr)

myresult <- map(mylist,
                ~myfunc(.))

in your case I imagine the function

myfunc <- function(x){
gather(data=x,
key = "Year", value = "Volume", Jan:Dec)
}

and mylist would be your dfList

Thanks @nirgrahamuk!
When I try with gather alone as per your suggestion, then it works. But if I try to add other cleaning steps, the result remains without any change. Here is what I was trying to do:

Created list of dataframes

dfList <- list(USA, Canada, Mexico)

Created function for initial cleanup

market <- function(x){
  x%>%
  filter(str_detect(2, "NA", negate = TRUE))%>%
  select(1:25, -2)%>%
  gather(key = "Year", value = "Volume", -1)
  
  return(x)
}

Iterating through lists

Northen_Market <- map(dfList,
                ~market(.))

The result of Northen_Market remains same as that of USA, Canada, Mexico without considering any changes mentioned in the function. I am sure I am missing out something very important in function here or we cannot use pipes in function. Can you please help? How should I structure it right?

Thank you!

when you pipe forward, you make a new object, but to return that changed object you need to have assigned it to a name
either

market <- function(x){
  x<- x%>%
  filter(str_detect(2, "NA", negate = TRUE))%>%
  select(1:25, -2)%>%
  gather(key = "Year", value = "Volume", -1)
  
  return(x)
}

or

market <- function(x){
  x%>%
  filter(str_detect(2, "NA", negate = TRUE))%>%
  select(1:25, -2)%>%
  gather(key = "Year", value = "Volume", -1)  -> x
  
  return(x)
}

they assign your results back to the same x name, these two codes are equivalent

library(magrittr) also allows a two way pipe, that achieves the same

market <- function(x){
  x%<>%
  filter(str_detect(2, "NA", negate = TRUE))%>%
  select(1:25, -2)%>%
  gather(key = "Year", value = "Volume", -1) 
  
  return(x)
}

Thank you so much @nirgrahamuk ! I opted for the first option as that's what I am used to in terms of assignments. Last question on this subject - After I am done cleaning these lists. How can I convert them back to dataframes as USA, Canada again - will it be unlist(dfList)? and in that case will it preserve original dataframe names USA, etc.?

Thank you!

they are dataframes, they are just dataframes that are in a list.
Northen_Market [[1]] is the transformed USA
Northen_Market [[2]] is the transformed Canada
if you want to pick them out by name, you should name them as you insert them into your original dfList.
For example:

dfList <- list(iris=iris,mtcars=mtcars) 
new_list <- map(dfList,
                ~head(.))
#access them one at a time
new_list$iris
new_list$mtcars

consider this example.

#name them as I add them, the left is name , dataframe on the right of the equal sign
dfList <- list(iris=iris,
mtcars=mtcars)

new_list <- map(dfList,
                ~head(.))

#how to access by name
new_list$iris
new_list$mtcars

Perfect! Thank you so much! Much appreciated @nirgrahamuk!

I also wanted to create a new column called Country while looping through these lists with their list names to differentiate, but it doesn't seem to work with lists the way it works with variable names.

Northen_Market <- Northen_Market%>%
mutate(Country = ifelse(dfList == Northen_Market$USA, "USA",
ifelse(dfList == Northen_Market$Canada, "Canada",
ifelse(dfList == Northen_Market$Mexico, "Mexico", ""
))))

Can we iterate the column name based on dataframe name? or the only way is to mutate in individual dataframe separately.

Thanks again for your help!

a list can not contain a column, only dataframes can contain column, and Northern_Market is a ilst of dataframes.

Here I go through a list of two dataframes (by their names, and add columns into them saying what that name was)

list_of_frames <- list(iris=as_tibble(iris),mpg=mpg)

altered_list_of_frames <- purrr::map(names(list_of_frames), # 'iris' then 'mpg'
                                     ~mutate(list_of_frames[[.]], src_col = .) # the . symbol will be replaced first with 'iris' then 'mpg'
            )

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.