Add specific rows to create new row using R dplyr

dplyr

#1

I am trying to add a new row Total which would be a sum of apples+bananas+oranges

      Metric_name         May      Jun 
1       apples             11       34       
2       bananas            632      640      
3       onions             479      503      
4       oranges            910      939    

How can I do it using dplyr?


#2

Based on the requirements, this is how I'd approach this:

library(tidyverse)

df <- tribble(
  ~Metric_name,         ~May,      ~Jun, 
  "apples",             11,       34,       
  "bananas",            632,      640,      
  "onions",             479,      503,      
  "oranges",            910,      939    
)

df %>%
  bind_rows(
    df %>%
      filter(Metric_name %in% c("apples", "bananas", "oranges")) %>%
      summarise_if(is.numeric, sum) %>%
      mutate(Metric_name = "Fruit totals") 
  )
#> # A tibble: 5 x 3
#>   Metric_name    May   Jun
#>   <chr>        <dbl> <dbl>
#> 1 apples          11    34
#> 2 bananas        632   640
#> 3 onions         479   503
#> 4 oranges        910   939
#> 5 Fruit totals  1553  1613

Created on 2018-08-24 by the reprex package (v0.2.0).


#3

Another approach:

library(tidyverse)
ex <- tibble::tibble(
  metric = c("apples", "bananas", "oranges", "not fruit"),
  May = rnorm(4),
  Jun = rnorm(4)
)

metrics <-rlang::syms(c("apples", "bananas", "oranges"))

ex %>%
  tidyr::gather(key = month, value = sales, -metric) %>%
  tidyr::spread(metric, sales) %>%
  dplyr::mutate(total = purrr::pmap_dbl(list(!!!metrics), ~sum(...))) %>%
  tidyr::gather(key = metric, value = sales, -month) %>%
  tidyr::spread(month, sales)
#> # A tibble: 5 x 3
#>   metric       Jun    May
#>   <chr>      <dbl>  <dbl>
#> 1 apples    -0.729  0.432
#> 2 bananas    0.340  0.722
#> 3 not fruit  0.267 -0.750
#> 4 oranges    2.14  -0.839
#> 5 total      1.75   0.315

Created on 2018-08-24 by the reprex package (v0.2.0).


#4

Though this may seem like a crazy way of doing things, I just wanted to add a "tidy" version of the task—not of adding a row, as such, but of getting the result (something to the effect of fruit total by month) using summarise().

library(tidyverse)

df <- tribble(
  ~Metric_name,         ~May,      ~Jun, 
  "apples",             11,       34,       
  "bananas",            632,      640,      
  "onions",             479,      503,      
  "oranges",            910,      939    
)

gathered_df <- df %>%
  gather(key = month, value = number, -Metric_name)  # -Metric_name means everything but Metric_name

head(gathered_df)
#> # A tibble: 6 x 3
#>   Metric_name month number
#>   <chr>       <chr>  <dbl>
#> 1 apples      May       11
#> 2 bananas     May      632
#> 3 onions      May      479
#> 4 oranges     May      910
#> 5 apples      Jun       34
#> 6 bananas     Jun      640

gathered_df %>%
  filter(Metric_name %in% c("apples", "bananas", "oranges")) %>%
  group_by(month) %>%
  summarise(total = sum(number))
#> # A tibble: 2 x 2
#>   month total
#>   <chr> <dbl>
#> 1 Jun    1613
#> 2 May    1553

Created on 2018-08-25 by the reprex package (v0.2.0.9000).

n.b. All of these answers are, in effect, doing the same thing—I'm just sticking to the basic tidyr and dplyr verbs, and, in keeping with tidy data principles, I avoid mixing summary information with individual observations in a data frame.

Edit: Technically, the format with separate months isn't "untidy," espeically not if the measures aren't actually the same (I'm assuming it's count or cost). The advantage of using the tidy format is really that it fits nicely in other tidyverse pipelines, e.g. with ggplot2

gathered_df %>%
  filter(Metric_name %in% c("apples", "bananas", "oranges")) %>%
  mutate(month = as_factor(month)) %>%
  ggplot(aes(x = Metric_name, y = number, fill = month)) +
  geom_bar(stat = "identity", position = position_dodge())


gathered_df %>%
  filter(Metric_name %in% c("apples", "bananas", "oranges")) %>%
  mutate(month = as_factor(month)) %>%
  ggplot(aes(x = month, y = number, fill = Metric_name)) +
  geom_bar(stat = "identity", position = position_dodge())