Stuck on some functional programming points

Say I have the following tibble:

people <- tibble::tribble(
          ~name,  ~gender, ~state_residence, ~city_residence,
          "Dan",   "Male",        "Indiana",  "Indianapolis",
        "Jerry",   "Male",       "New York", "New York City",
       "Bojack",   "Male",     "California",   "Los Angeles",
       "Leslie", "Female",        "Indiana",        "Pawnee",
          "Liz", "Female",       "New York", "New York City",
        "Jesse",   "Male",     "California", "San Francisco",
       "Daphne", "Female",     "Washington",       "Seattle"
)

And I've defined the following function:

count_and_percent <- function(tbl, var_obj, var_name) {
    tbl %>%
    group_by({{var_obj}}) %>%
    summarize(count = n()) %>%
    mutate(
        variable = var_name,
        percent = round(
            (count/nrow(tbl)*100),
            digits = 2
        )
    ) %>%
    rename(
        category = {{var_obj}}
    )
}

So for example I get:

> count_and_percent(people, state_residence, "state_residence")
# A tibble: 4 × 4
  category   count variable        percent
  <chr>      <int> <chr>             <dbl>
1 California     2 state_residence    28.6
2 Indiana        2 state_residence    28.6
3 New York       2 state_residence    28.6
4 Washington     1 state_residence    14.3

Question 1: Is there a way to modify the function so that the var_obj and var_name arguments can be combined into one argument?

Questions 2: If I want to apply the count_and_percent function across multiple columns in the same dataframe, could you illustrate how to do that with one of the purrr functions? Still trying to wrap my head around this stuff.

You could do this for part 1:

count_and_percent <- function(tbl, var_name) {
  tbl %>%
    group_by(.data[[var_name]]) %>%
    summarize(count = n()) %>%
    mutate(
      variable = var_name,
      percent = round(
        (count/nrow(tbl)*100),
        digits = 2
      )
    ) %>%
    rename(
      category = .data[[var_name]]
    )
}


> count_and_percent(people, "state_residence")
# A tibble: 4 × 4
  category   count variable        percent
  <chr>      <int> <chr>             <dbl>
1 California     2 state_residence    28.6
2 Indiana        2 state_residence    28.6
3 New York       2 state_residence    28.6
4 Washington     1 state_residence    14.3
1 Like

Are you after something like this for part 2?

columns <- names(people[2:4])
map_df(columns, ~count_and_percent(people, .x))


# A tibble: 12 × 4
   category      count variable        percent
   <chr>         <int> <chr>             <dbl>
 1 California        2 state_residence    28.6
 2 Indiana           2 state_residence    28.6
 3 New York          2 state_residence    28.6
 4 Washington        1 state_residence    14.3
 5 Indianapolis      1 city_residence     14.3
 6 Los Angeles       1 city_residence     14.3
 7 New York City     2 city_residence     28.6
 8 Pawnee            1 city_residence     14.3
 9 San Francisco     1 city_residence     14.3
10 Seattle           1 city_residence     14.3
11 Female            3 gender             42.9
12 Male              4 gender             57.1
1 Like

Yes exactly, thank you!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.