Group_by on elements of a large list

Hi All,

I've only recently started coding and am totally stuck!

I have a large list (Large.List.Df) that consists of 50+ arrays (each with 1000+ rows and 5+ columns). These arrays are all listed in double square brackets (e.g. [[A]] ) in a drop down menu when you open the dataframe Large.List.Df

I would like to use group_by() on name.of.column in each of the 50+ arrays so that I can mutate(name.of.new.column = 1:n()). I have used this combination of group_by(name.of.column) and mutate(name.of.new.column = 1:n()) on a normal dataframe (so just one element of the large list) and it works perfectly. But, if I run:

NewDf <- Large.List.Df %>% group_by(name.of.column) %>% mutate(name.of.new.column = 1:n())

I get the following error message:

Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "list"

I hope this all makes sense. I would be very grateful for any suggestions, advice, help, etc!

Thanks!

Are you looking for something like the following. If not, please post an example of your data as a Reproducible Example.

library(purrr)
#> Warning: package 'purrr' was built under R version 3.5.3
library(dplyr)

LIST <- list(A = data.frame(D = 1:6, B = rep(LETTERS[1:3], 2)),
             C = data.frame(E = 2:7, B = rep(LETTERS[1:3], 2)))
LIST
#> $A
#>   D B
#> 1 1 A
#> 2 2 B
#> 3 3 C
#> 4 4 A
#> 5 5 B
#> 6 6 C
#> 
#> $C
#>   E B
#> 1 2 A
#> 2 3 B
#> 3 4 C
#> 4 5 A
#> 5 6 B
#> 6 7 C

MyFunc <- function(DF) {
  DF %>% group_by(B) %>% 
    mutate(NewCol = 1:n()) %>%
    arrange(B)
}

LIST2 <- map(LIST, MyFunc)
LIST2
#> $A
#> # A tibble: 6 x 3
#> # Groups:   B [3]
#>       D B     NewCol
#>   <int> <fct>  <int>
#> 1     1 A          1
#> 2     4 A          2
#> 3     2 B          1
#> 4     5 B          2
#> 5     3 C          1
#> 6     6 C          2
#> 
#> $C
#> # A tibble: 6 x 3
#> # Groups:   B [3]
#>       E B     NewCol
#>   <int> <fct>  <int>
#> 1     2 A          1
#> 2     5 A          2
#> 3     3 B          1
#> 4     6 B          2
#> 5     4 C          1
#> 6     7 C          2

Created on 2019-09-16 by the reprex package (v0.2.1)

2 Likes

This was exactly what I needed - thank you so very much @FJCC !!! Please could you explain what the function(DF) does? I'm a total newbie and haven't actually written any functions yet.

Many thanks,
N

MyFunc <- function(DF) {
  DF %>% group_by(B) %>% 
    mutate(NewCol = 1:n()) %>%
    arrange(B)
}

The above part of my code defines a new function that takes one argument named DF. It processes DF through the steps within the braces, grouping by B, mutating it to add NewCol, and sorting by B and then returns the result of that process. It would have been clearer if I had written

MyFunc <- function(DF) {
  tmp <- DF %>% group_by(B) %>% 
    mutate(NewCol = 1:n()) %>%
    arrange(B)

  return(tmp)
}

After running that code, I can pass a data frame that has a column named B into MyFunc and get back a data frame with the additional NewCol. Below is an example of NewFunc acting on the first element of the LIST I defined in my previous post.

library(dplyr)

LIST <- list(A = data.frame(D = 1:6, B = rep(LETTERS[1:3], 2)),
             C = data.frame(E = 2:7, B = rep(LETTERS[1:3], 2)))
LIST
#> $A
#>   D B
#> 1 1 A
#> 2 2 B
#> 3 3 C
#> 4 4 A
#> 5 5 B
#> 6 6 C
#> 
#> $C
#>   E B
#> 1 2 A
#> 2 3 B
#> 3 4 C
#> 4 5 A
#> 5 6 B
#> 6 7 C

MyFunc <- function(DF) {
  DF %>% group_by(B) %>% 
    mutate(NewCol = 1:n()) %>%
    arrange(B)
}

subList <- MyFunc(LIST[[1]])
subList
#> # A tibble: 6 x 3
#> # Groups:   B [3]
#>       D B     NewCol
#>   <int> <fct>  <int>
#> 1     1 A          1
#> 2     4 A          2
#> 3     2 B          1
#> 4     5 B          2
#> 5     3 C          1
#> 6     6 C          2

Created on 2019-09-16 by the reprex package (v0.2.1)

MyFunc is no different than a standard R function like mean() that returns the average of whatever is passed to it, except that MyFunc is very simple, with no error handling or flexibility.

I coupled MyFunc with map(). What map() does is act on each element of the list that is given as its first argument using the function that is given as its second argument.
The call

map(LIST, MyFunc)

just acts on each element of LIST with MyFunc.

6 Likes

This is such a clear explanation - thank you so much @FJCC !

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.