Is using the rowwise always the same as grouping by everything?

davidhodge931 · December 15, 2020, 6:33am

library(tidyverse)
df <- tibble(x = runif(6), y = runif(6), z = runif(6))
# Compute the mean of x, y, z in each row

df %>% 
  rowwise() %>% 
  mutate(m = mean(c(x, y, z)))

df %>% 
  group_by(across(everything())) %>% 
  mutate(m = mean(c(x, y, z)))

gueyenono · December 15, 2020, 2:42pm

Hi @davidhodge931,

Your simulated dataset does not really take advantage of the group_by() function - it does not really contain any grouping variable. So yes, in your example, they are similar. However, the results are different when you have an actual grouping variable:

library(dplyr)

d <- tibble(
  g = gl(3, 2),
  x = runif(6), 
  y = runif(6),
  z = runif(6)
)

d %>%
  group_by(g) %>%
  mutate(m = mean(c(x, y, z)))

d %>% 
  rowwise() %>% 
  mutate(m = mean(c(x, y, z)))

davidhodge931 · December 15, 2020, 8:16pm

Thanks @gueyenono

Having thought about this, I think that it is the same unless there is a duplicate row

library(dplyr)

d <- tibble( #new data with  last row duplicated
  g = c("A", "B", "A"), 
  x = c(1, 2, 1), 
  y = c(1, 2, 1),
  z = c(1, 2, 1)
)

d %>% 
  group_by(across(everything())) %>% 
  mutate(m = mean(x, y, z)) #breaks

d %>% 
  rowwise() %>% 
  mutate(m = mean(x, y, z)) #works

system · December 22, 2020, 8:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.