# Calculating rowwise totals and proportions using tidyeval?

#1

I haven't delved too deep into tidyeval and quasiquotation yet, but I have a case where it seems like it makes sense to use and I need some help to make it work.

Say I have a tibble in wide format where each row is an election district and each column is the number of votes a candidate received. I want to calculate to total votes per district and the proportion of votes each candidate received in each district.

``````library(tidyverse)

~district, ~a, ~b, ~c,
"dist1", 20L, 5L, 10L,
"dist2", 25L, 5L, 15L,
"dist3", 10L, 15L, 15L,
)

#> # A tibble: 3 x 4
#>   district     a     b     c
#>   <chr>    <int> <int> <int>
#> 1 dist1       20     5    10
#> 2 dist2       25     5    15
#> 3 dist3       10    15    15
``````

If I were just calculating it for this set of candidates, I would do something like the following:

(edited to include `rowise()`) for expected totals)

``````votes %>%
rowwise() %>%
mutate(
total = sum(a + b + c),
a_prop = a / total,
b_prop = b / total,
c_prop = c / total
)
#> Source: local data frame [3 x 8]
#> Groups: <by row>
#>
#> # A tibble: 3 x 8
#>   district     a     b     c total a_prop b_prop c_prop
#>   <chr>    <int> <int> <int> <int>  <dbl>  <dbl>  <dbl>
#> 1 dist1       20     5    10    35  0.571  0.143  0.286
#> 2 dist2       25     5    15    45  0.556  0.111  0.333
#> 3 dist3       10    15    15    40  0.250  0.375  0.375
``````

But where I'm having trouble is if I want to write a function to do this where there could be any number of candidates with any possible name. I think this is a case where I would want to use `(...)` and `:=` in my function, but I'm not really sure how to go about it unquoting each element of the dots list to create a new column.

#3

I understand where you going with your line of thought, but it is possible to calculate this without `tidyeval` (or at least get really close to the result you want). Here is my a bit hacky attempt:

``````candidates <- c("a", "b", "c") # I assume you know name of the candidates in advance
tidyr::gather(key = "candidate", ... =  -district) %>%
dplyr::group_by(district) %>%
dplyr::mutate(prop = value/sum(value)) %>%
dplyr::select(-value) %>%
tidyr::spread(candidate, prop, sep = "_") %>%
dplyr::left_join(votes, by = "district") %>%
dplyr::ungroup() %>%
dplyr::mutate(total = rowSums(votes[, candidates]))
``````

#4

Thanks @mishabalyasin, that gets the desired output.

But I'm trying to figure out how to do this if I didn't know the names of candidates in advance or if I wanted to map this function over a list of tibbles with different numbers and names of candidates. So, that is the case where I'm thinking I may need to generalize with tidyeval.

#5

But you can always get names of candidates, no? For example:

``````nms <- dplyr::select(votes, -district) %>%
names()
``````

This way you don't need to know the names in advance, you can always get them in a format you need programmatically.

#6

One more version that uses `tidyeval`, but only slightly :

``````library(tidyverse)

~district, ~a, ~b, ~c,
"dist1", 20L, 5L, 10L,
"dist2", 25L, 5L, 15L,
"dist3", 10L, 15L, 15L,
)

candidates <- votes %>% dplyr::select(-district) %>% names()

prop <- function(df, total = "total", candidate = "a"){
name <- paste0(candidate, "_prop")
prop <- df[[candidate]]/df[[total]]
df %>%
dplyr::mutate(!!name := prop)
}

sum_columns <- function(df, columns) {
rowSums(df[, columns])
}

f <- lapply(candidates, function(x){purrr::partial(prop, candidate = x)}) %>% Reduce(f = purrr::compose, x = .)

dplyr::mutate(total = sum_columns(., candidates)) %>%
f()

# A tibble: 3 x 8
district     a     b     c total c_prop b_prop a_prop
<chr>    <int> <int> <int> <dbl>  <dbl>  <dbl>  <dbl>
1 dist1       20     5    10  35.0  0.286  0.143  0.571
2 dist2       25     5    15  45.0  0.333  0.111  0.556
3 dist3       10    15    15  40.0  0.375  0.375  0.250

``````

In this case output is exactly the same (save for ordering) that you wanted, so can be a viable solution, I think.

#7

This solution works without tidyeval, provided that the data.frame consists only of the district variable and columns with candidate names:

``````library(tidyverse)

~district, ~a, ~b, ~c,
"dist1", 20L, 5L, 10L,
"dist2", 25L, 5L, 15L,
"dist3", 10L, 15L, 15L,
)

candidate_proportion <- function(d) {
require(tidyverse)

d_gather <- d %>%
gather(key = "candidate", value = "vote_count", -district)

d_sum <- d_gather %>%
group_by(district) %>%
summarise(district_sum = sum(vote_count))

d_prop <- d_gather %>%
left_join(d_sum, by = "district") %>%
mutate(prop = vote_count / district_sum) %>%
select(-vote_count, -district_sum) %>%
mutate(candidate = paste0(candidate,"_prop")) %>%
spread(key = candidate, value = prop)

d %>% left_join(d_prop, by = "district")

}

``````

#8

These are all good solutions! I'm accepting @w.joel.schneider's as it's the most intuitive to me. I guess I was overthinking it and some reshaping and grouping can get this done without `tidyeval`.

Thanks!

#9

The previous solutions are already covering a lot, and I am not pretending to offer a better way.
It is just that When I saw the question, I wanted to play with `purrr`, `tidyeval` and list-columns in tibble to see what I could do. Specifically to found a way of doing this manipulation trying to fit with list columns in a unique tibble. This is just to share another workflow.

1. First preparing the tidy data (1 column = 1 variable)
2. Preparing each of desired spread format keeping all result in a unique tibble.
3. Extract what is needed, for example the all synthesis table.
``````library(tidyverse)

~district, ~a, ~b, ~c,
"dist1", 20L, 5L, 10L,
"dist2", 25L, 5L, 15L,
"dist3", 10L, 15L, 15L,
)

# First tidy data

tidy_tab <- votes %>%
gather("candidates", "votes", -district) %>%
mutate(prop = votes / n) %>%
rename(total = n)
tidy_tab
#> # A tibble: 9 x 5
#>   district candidates votes total  prop
#>   <chr>    <chr>      <int> <int> <dbl>
#> 1 dist1    a             20    35 0.571
#> 2 dist2    a             25    45 0.556
#> 3 dist3    a             10    40 0.250
#> 4 dist1    b              5    35 0.143
#> 5 dist2    b              5    45 0.111
#> 6 dist3    b             15    40 0.375
#> 7 dist1    c             10    35 0.286
#> 8 dist2    c             15    45 0.333
#> 9 dist3    c             15    40 0.375

# use tidy eval to avoid repetition

spread_custom <- function(df, col) {
col <- enquo(col)
select(df, candidates, !! col) %>%
}

# but all in the same tibble
spread_tab <- tidy_tab %>%
nest(-district, -total) %>%
mutate(
prop = map(data, ~ spread_custom(.x, prop))
)
#> # A tibble: 3 x 5
#>   district total data             votes            prop
#>   <chr>    <int> <list>           <list>           <list>
#> 1 dist1       35 <tibble [3 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
#> 2 dist2       45 <tibble [3 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
#> 3 dist3       40 <tibble [3 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>

# get the tidy format
unnest(data)
#> # A tibble: 9 x 5
#>   district total candidates votes  prop
#>   <chr>    <int> <chr>      <int> <dbl>
#> 1 dist1       35 a             20 0.571
#> 2 dist1       35 b              5 0.143
#> 3 dist1       35 c             10 0.286
#> 4 dist2       45 a             25 0.556
#> 5 dist2       45 b              5 0.111
#> 6 dist2       45 c             15 0.333
#> 7 dist3       40 a             10 0.250
#> 8 dist3       40 b             15 0.375
#> 9 dist3       40 c             15 0.375

unnest(votes, .drop = T)
#> # A tibble: 3 x 5
#>   district total     a     b     c
#>   <chr>    <int> <int> <int> <int>
#> 1 dist1       35    20     5    10
#> 2 dist2       45    25     5    15
#> 3 dist3       40    10    15    15

# get the synthesis table
select(-data) %>%
unnest(.sep = "_")
#> # A tibble: 3 x 8
#>   <chr>    <int>   <int>   <int>   <int>  <dbl>  <dbl>  <dbl>
#> 1 dist1       35      20       5      10  0.571  0.143  0.286
#> 2 dist2       45      25       5      15  0.556  0.111  0.333
#> 3 dist3       40      10      15      15  0.250  0.375  0.375
``````

Created on 2018-02-04 by the reprex package (v0.1.1.9000).

#10

Below might be helpful:

``````library(tidyverse)

~district, ~a, ~b, ~c,
"dist1", 20L, 5L, 10L,
"dist2", 25L, 5L, 15L,
"dist3", 10L, 15L, 15L,
)

# totals
df <- votes %>%
select(-district) %>%
rowwise() %>%
do(total = sum(unlist(.))) %>%
unnest() %>%

# proportion
candidates <- vars(colnames(df) %>% discard(~.x %in% c('district', 'total')))
df %>% mutate_at(candidates, ~.x / df\$total)
#> # A tibble: 3 x 5
#>   district     a     b     c total
#>   <chr>    <dbl> <dbl> <dbl> <int>
#> 1 dist1    0.571 0.143 0.286    35
#> 2 dist2    0.556 0.111 0.333    45
#> 3 dist3    0.25  0.375 0.375    40
``````

Created on 2018-10-20 by the reprex package (v0.2.1)

Session info
``````devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, darwin15.6.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  tz       America/New_York
#>  date     2018-10-20
#> Packages -----------------------------------------------------------------
#>  package    * version     date       source
#>  assertthat   0.2.0       2017-04-11 CRAN (R 3.5.0)
#>  backports    1.1.2       2017-12-13 CRAN (R 3.5.0)
#>  base       * 3.5.1       2018-07-05 local
#>  bindr        0.1.1       2018-03-13 CRAN (R 3.5.0)
#>  bindrcpp   * 0.2.2       2018-03-29 CRAN (R 3.5.0)
#>  broom        0.5.0       2018-07-17 CRAN (R 3.5.0)
#>  cellranger   1.1.0       2016-07-27 CRAN (R 3.5.0)
#>  cli          1.0.0       2017-11-05 CRAN (R 3.5.0)
#>  colorspace   1.3-2       2016-12-14 CRAN (R 3.5.0)
#>  compiler     3.5.1       2018-07-05 local
#>  crayon       1.3.4       2017-09-16 CRAN (R 3.5.0)
#>  datasets   * 3.5.1       2018-07-05 local
#>  devtools     1.13.6      2018-06-27 CRAN (R 3.5.0)
#>  digest       0.6.18      2018-10-10 cran (@0.6.18)
#>  dplyr      * 0.7.6       2018-06-29 CRAN (R 3.5.1)
#>  evaluate     0.11        2018-07-17 CRAN (R 3.5.0)
#>  fansi        0.2.3       2018-05-06 CRAN (R 3.5.0)
#>  forcats    * 0.3.0       2018-02-19 CRAN (R 3.5.0)
#>  ggplot2    * 3.0.0       2018-07-03 CRAN (R 3.5.0)
#>  glue         1.3.0       2018-07-17 CRAN (R 3.5.0)
#>  graphics   * 3.5.1       2018-07-05 local
#>  grDevices  * 3.5.1       2018-07-05 local
#>  grid         3.5.1       2018-07-05 local
#>  gtable       0.2.0       2016-02-26 CRAN (R 3.5.0)
#>  haven        1.1.2       2018-06-27 CRAN (R 3.5.0)
#>  hms          0.4.2       2018-03-10 CRAN (R 3.5.0)
#>  htmltools    0.3.6       2017-04-28 CRAN (R 3.5.0)
#>  httr         1.3.1       2017-08-20 CRAN (R 3.5.0)
#>  jsonlite     1.5         2017-06-01 CRAN (R 3.5.0)
#>  knitr        1.20        2018-02-20 CRAN (R 3.5.0)
#>  lattice      0.20-35     2017-03-25 CRAN (R 3.5.0)
#>  lazyeval     0.2.1       2017-10-29 CRAN (R 3.5.0)
#>  lubridate    1.7.4       2018-04-11 CRAN (R 3.5.0)
#>  magrittr     1.5         2014-11-22 CRAN (R 3.5.0)
#>  memoise      1.1.0       2017-04-21 CRAN (R 3.5.0)
#>  methods    * 3.5.1       2018-07-05 local
#>  modelr       0.1.2       2018-05-11 CRAN (R 3.5.0)
#>  munsell      0.5.0       2018-06-12 CRAN (R 3.5.0)
#>  nlme         3.1-137     2018-04-07 CRAN (R 3.5.0)
#>  pillar       1.3.0       2018-07-14 CRAN (R 3.5.0)
#>  pkgconfig    2.0.1       2017-03-21 CRAN (R 3.5.0)
#>  plyr         1.8.4       2016-06-08 CRAN (R 3.5.0)
#>  purrr      * 0.2.5       2018-05-29 CRAN (R 3.5.0)
#>  R6           2.2.2       2017-06-17 CRAN (R 3.5.0)
#>  Rcpp         0.12.18     2018-07-23 CRAN (R 3.5.0)
#>  readr      * 1.1.1       2017-05-16 CRAN (R 3.5.0)
#>  readxl       1.1.0       2018-04-20 CRAN (R 3.5.0)
#>  rlang        0.2.99.0000 2018-10-14 Github (r-lib/rlang@30d6671)
#>  rmarkdown    1.10        2018-06-11 CRAN (R 3.5.0)
#>  rprojroot    1.3-2       2018-01-03 CRAN (R 3.5.0)
#>  rvest        0.3.2       2016-06-17 CRAN (R 3.5.0)
#>  scales       1.0.0       2018-08-09 CRAN (R 3.5.0)
#>  stats      * 3.5.1       2018-07-05 local
#>  stringi      1.2.4       2018-07-20 CRAN (R 3.5.0)
#>  stringr    * 1.3.1       2018-05-10 CRAN (R 3.5.0)
#>  tibble     * 1.4.2       2018-01-22 CRAN (R 3.5.0)
#>  tidyr      * 0.8.1       2018-05-18 CRAN (R 3.5.0)
#>  tidyselect   0.2.4       2018-02-26 CRAN (R 3.5.0)
#>  tidyverse  * 1.2.1       2017-11-14 CRAN (R 3.5.0)
#>  tools        3.5.1       2018-07-05 local
#>  utf8         1.1.4       2018-05-24 CRAN (R 3.5.0)
#>  utils      * 3.5.1       2018-07-05 local
#>  withr        2.1.2       2018-03-15 CRAN (R 3.5.0)
#>  xml2         1.2.0       2018-01-24 CRAN (R 3.5.0)
#>  yaml         2.2.0       2018-07-25 CRAN (R 3.5.0)
``````

#11

Hi,

I realize this has been solved but I also found that the `Janitor` package does something like this. The GitHub is here for it in case anyone finds it useful

``````library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.5.1
#> Warning: package 'dplyr' was built under R version 3.5.1
library(janitor)
#> Warning: package 'janitor' was built under R version 3.5.1

~district, ~a, ~b, ~c,
"dist1", 20L, 5L, 10L,
"dist2", 25L, 5L, 15L,
"dist3", 10L, 15L, 15L,
)