I'm trying to arrange calculated columns immediately after source columns when using dplyr column-wise operations
library(dplyr)
Example data
df <- tibble(
id = c(1, 2, 2),
id_row = c(1, 1, 2),
name_first = c("John", "Jane", "Jane"),
city = c("NY", "DAL", "HOU"),
x = c(0, 1, 0)
)
# A tibble: 3 x 5
id id_row name_first city x
<dbl> <dbl> <chr> <chr> <dbl>
1 1 1 John NY 0
2 2 1 Jane DAL 1
3 2 2 Jane HOU 0
Within ID, the values of name_first and city should be constant. The values for id_row and x need not be constant. I want to test for and inspect differing values of name_first and city within-id.
One convenient way to do this is with mutate()
and across()
.
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
)
# A tibble: 3 x 7
# Groups: id [2]
id id_row name_first city x name_first_all_match city_all_match
<dbl> <dbl> <chr> <chr> <dbl> <lgl> <lgl>
1 1 1 John NY 0 TRUE TRUE
2 2 1 Jane DAL 1 TRUE FALSE
3 2 2 Jane HOU 0 TRUE FALSE
The issue is that the "all_match" (calculated) columns are added to the far right of the data frame rather than immediately after their source column. This makes it difficult to visually inspect differing values for the columns of interest.
Of course, in this small data frame, we could easily rearrange the columns using select()
.
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
select(id, id_row, starts_with("name_first"), starts_with("city"), x)
# A tibble: 3 x 7
# Groups: id [2]
id id_row name_first name_first_all_match city city_all_match x
<dbl> <dbl> <chr> <lgl> <chr> <lgl> <dbl>
1 1 1 John TRUE NY TRUE 0
2 2 1 Jane TRUE DAL FALSE 1
3 2 2 Jane TRUE HOU FALSE 0
The issue with that approach is that it quickly becomes pretty cumbersome with more columns. A more tractable approach would be to sort the names alphabetically...
df %>%
group_by(id) %>%
mutate(
across(
.cols = c(name_first, city),
.fns = ~ length(unique(.x)) == 1,
.names = "{col}_all_match"
)
) %>%
select(sort(names(.)))
# A tibble: 3 x 7
# Groups: id [2]
city city_all_match id id_row name_first name_first_all_match x
<chr> <lgl> <dbl> <dbl> <chr> <lgl> <dbl>
1 NY TRUE 1 1 John TRUE 0
2 DAL FALSE 2 1 Jane TRUE 1
3 HOU FALSE 2 2 Jane TRUE 0
...but in my situation I need to preserve the original column order. I'd also prefer to stick with Tidyverse solutions if possible.
Any ideas are appreciated!