Mutate_all : modify values of each variable by the variable mean

libjohn · April 11, 2018, 9:30pm

Is there a way to modify each value in a data frame variable by mutating each value by the mean of the variable? I'm trying to do use mutate_all and hitting a wall on my understanding. I've tried a few approaches and nothing seems to work.

I want to do something like this:

sw_height_mass <- starwars %>% 
  select(height, mass)

starwars_means <- starwars %>% 
  select(height, mass) %>% 
  mutate_all(mean, na.rm = TRUE) %>% 
  rename(hmean = height,
         mmean = mass)

starwars_what_i_want <- 
  bind_cols(sw_height_mass, starwars_means) %>% 
  mutate(new_height = height / hmean,
         new_mass = mass / mmean) %>% 
  select(5:6)

starwars_what_i_want

Ideally with less code, I imagine something like this...

starwars %>% 
  select(height, mass) %>% 
  mutate_all((. / mean), na.rm = TRUE)

But my imagination doesn't match reality and I'm not figuring out what will work. My actual data has a lot more variables.

Thanks for any consideration.

prosoitos · April 11, 2018, 9:57pm

You should look at purrr::map()

prosoitos · April 12, 2018, 12:06am

Here is your code:

library(tidyverse)

starwars %>%
  select(height, mass) %>%
  map_dfc(~ . / mean(., na.rm = T))

torvaney · April 12, 2018, 11:16am

Another method might be to use tidyr::spread and tidyr::gather.

library(tidyverse)
                                                                                                   
starwars %>%                                                                                       
  select(name, height, mass) %>%                        # Optional (only here to make output clearer)
  gather(key, value, height, mass) %>%                  # Go from wide:long data                     
  group_by(key) %>%                                     # Groups are `height` and `mass` (see: `gather` args)            
  mutate(value = value / mean(value, na.rm = T)) %>%    # Divide by group mean                       
  spread(key, value)                                                                                 

#> # A tibble: 87 x 3
#>    name                height   mass
#>    <chr>                <dbl>  <dbl>
#>  1 Ackbar               1.03   0.853
#>  2 Adi Gallia           1.06   0.514
#>  3 Anakin Skywalker     1.08   0.863
#>  4 Arvel Crynyd        NA     NA    
#>  5 Ayla Secura          1.02   0.565
#>  6 Bail Prestor Organa  1.10  NA    
#>  7 Barriss Offee        0.952  0.514
#>  8 BB8                 NA     NA    
#>  9 Ben Quadinaros       0.935  0.668
#> 10 Beru Whitesun lars   0.946  0.771
#> # ... with 77 more rows

libjohn · April 12, 2018, 2:09pm

Thanks @prosoitos.

You helped me learn a bit more about the whole '~ .' notation. Pointed me in a good direction. Huge help!!

Applying that, I see these two are equivalent...

starwars %>% 
  select(height, mass) %>% 
  mutate_all(~ . / mean(., na.rm = TRUE))

starwars %>%
  select(height, mass) %>%
  map_dfc(~ . / mean(., na.rm = T))

Thanks again.

libjohn · April 12, 2018, 2:14pm

Thanks @torvaney. Nice approach. Thanks for the commenting / documentation.

prosoitos · April 12, 2018, 3:38pm

They are equivalent here since the input is a data frame. The map solution is much more general in that it would work for any list as input. (And you can play with map, map_dbl, map_dfc, etc. to get various class as output).

That said, if you are staying in a data frame framework, staying within dplyr makes sense.