Extract data frame from a list with condition

Suppose I have a list containing many data frames with varying columns. How do I extract the data frame that has four columns?

toy_list <- list(
  df1 = data.frame(a = 1:2, b =3:4, c = 5:6),
  df2 = data.frame(a = 1:2, b =3:4, c = 5:6, d = 7:8),
  df3 = data.frame(a = 1:2)
)

You can get the amount of columns in a data.frame using ncol.

So, in your case calling ncol(toy_list$df1) will return 3.

Of course, you want to apply this to many items in a list, making it a great job for the apply family, in this case for lapply.

dfcols <- lapply(toy_list, ncol)
dfcols
#> $df1
#> [1] 3
#>
#> $df2
#> [1] 4
#>
#> $df3
#> [1] 1

We are already halfway there. We can compare the result against an integer, in this case the amount of columns we want.

dfcols == 4
#>  df1   df2   df3
#>  FALSE  TRUE FALSE

And use this for logical subsetting of the original list.

selection <- toy_list[dfcols == 4]
selection 
#> $df2
#>   a b c d
#> 1 1 3 5 7
#> 2 2 4 6 8

Note that this will return a list as well and will contain all entries with four columns. Of course you can extract the individual items from that list as well.

selection[[1]]
#>   a b c d
#> 1 1 3 5 7
#> 2 2 4 6 8

Note, that instead of lapply you could also use purrr::map in the same way if you want a tidyverse solution.

1 Like

Thanks a lot for the detailed solution! If you get time, please share the tidyverse approach. Beginner here. Cheers!

With purrr you can do it like this:

library("purrr")
selection <- toy_list %>% keep(~ ncol(.x) == 4)
selection 
#> $df2
#>   a b c d
#> 1 1 3 5 7
#> 2 2 4 6 8

Note that this returns a list so, as above you need to select individual elements separately as well.

selection[[1]]
#>   a b c d
#> 1 1 3 5 7
#> 2 2 4 6 8

If you want to keep the information on the amount of columns in each data.frame around you can obtain it in a call similar to the lapply above, but with map.

toy_list %>% map(ncol)
#> $df1
#> [1] 3
#>
#> $df2
#> [1] 4
#>
#> $df3
#> [1] 1

And finally, of course you can compress the non-tidyverse approach from my previous post into a one-liner as well:

selection <- toy_list[lapply(toy_list, ncol) == 4]
selection 
#> $df2
#>   a b c d
#> 1 1 3 5 7
#> 2 2 4 6 8
1 Like

Many thanks again for the detailed explanation!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.