Error: Apply count function across multiple columns

library(tidyverse)

census_data <- tibble::tribble(
      ~location, ~population_est,
   "Alachua County",           218222,
   "Baker County",              21516,
   "Bay County",               145705,
   "Bradford County",           21920,
   "Brevard County",           475860,
   "Broward County",          1531882,
   "Calhoun County",            11682,
   "Charlotte County",        153842,
   "Citrus County",            122938,
  "Clay County",              160699
  )

# when applying the count() function to one column, the code works:

census_data %>%
  separate(location, into = c("loc_1", "loc_2"), sep = " ") %>%
  count(loc_1)
#> # A tibble: 10 x 2
#>    loc_1         n
#>    <chr>     <int>
#>  1 Alachua       1
#>  2 Baker         1
#>  3 Bay           1
#>  4 Bradford      1
#>  5 Brevard       1
#>  6 Broward       1
#>  7 Calhoun       1
#>  8 Charlotte     1
#>  9 Citrus        1
#> 10 Clay          1


# In this case, when applying the count() function to more than one column, I get the error message below:

census_data %>%
  separate(location, into = c("loc_1", "loc_2"), sep = " ") %>%
  summarize(c_loc_1 = count(loc_1), c_loc_2 = count(loc_2))
#> Error in `summarize()`:
#> ! Problem while computing `c_loc_1 = count(loc_1)`.
#> Caused by error in `UseMethod()`:
#> ! no applicable method for 'count' applied to an object of class "character"

Created on 2022-08-22 by the reprex package (v2.0.1)

If you look at the dplyr::count documentation, dplyr::count is a summarizing function. This is from the documentation:

df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n())

So your code is essentially doing the summarize twice. I would do something like this:

census_data %>% 
    separate(location, into = c('loc_1', 'loc_2'), sep = ' ') %>% 
    count(loc_1, loc_2)

Hi @dvetsch75 ,

Thanks for your answer. This helps.

I guess I left out my next question in the example provided. As the title suggests, I'm trying to apply the count() function (used as an example here, but would like to extend to other functions) iteratively across columns.

Your answer provides the results for both columns together. How would I go about approaching this same logic but applied to each column individually?

If I try the map() function, I get the same error as the original example.



census_data <- census_data %>% 
  separate(location, into = c('loc_1', 'loc_2'), sep = ' ')

# Your answer

census_data %>% 
  count(loc_1, loc_2)
#> # A tibble: 10 x 3
#>    loc_1     loc_2      n
#>    <chr>     <chr>  <int>
#>  1 Alachua   County     1
#>  2 Baker     County     1
#>  3 Bay       County     1
#>  4 Bradford  County     1
#>  5 Brevard   County     1
#>  6 Broward   County     1
#>  7 Calhoun   County     1
#>  8 Charlotte County     1
#>  9 Citrus    County     1
#> 10 Clay      County     1

# My goal

census_data %>% 
  count(loc_1)
#> # A tibble: 10 x 2
#>    loc_1         n
#>    <chr>     <int>
#>  1 Alachua       1
#>  2 Baker         1
#>  3 Bay           1
#>  4 Bradford      1
#>  5 Brevard       1
#>  6 Broward       1
#>  7 Calhoun       1
#>  8 Charlotte     1
#>  9 Citrus        1
#> 10 Clay          1

# then

census_data %>% 
  count(loc_2)
#> # A tibble: 1 x 2
#>   loc_2      n
#>   <chr>  <int>
#> 1 County    10

# Approached using purrr function map():

census_data %>% 
  map(., ~ count(.x))
#> Error in UseMethod("count"): no applicable method for 'count' applied to an object of class "character"

Created on 2022-08-22 by the reprex package (v2.0.1)

library(tidyverse)

start_df <- tibble::tribble(
  ~loc_1,   ~loc_2, ~population_est,
  "Alachua", "County",          218222,
  "Baker", "County",           21516,
  "Bay", "County",          145705,
  "Bradford", "County",           21920,
  "Brevard", "County",          475860,
  "Broward", "County",         1531882,
  "Calhoun", "County",           11682,
  "Charlotte", "County",          153842,
  "Citrus", "County",          122938,
  "Clay", "County",          160699
)

locs <- names(select(start_df,
             where(is.character)))

map(locs,
    ~  count(start_df,
                 !!sym(.x)))

Thanks @nirgrahamuk !

Time to familiarize myself with the rlang package. Have not used the injection operator !! before. Or the sym() function for that matter.

Any resources for learning more about their use would be greatly appreciated.

My pick would be : Introduction - Metaprogramming| Advanced R (hadley.nz)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.