# Calculate sd across a subset of columns based on a regex

Okay, so I'm looking for a tidy solution to the following:

Given a tibble `d`:

``````library('tidyverse')
set.seed(733744)
n = 10
d = tibble(x1 = rnorm(n), x2 = rnorm(n),
y1 = rnorm(n), y2 = rnorm(n),
z1 = rnorm(n), z2 = rnorm(n))
``````

I.e.

``````> d
# A tibble: 10 x 6
x1     x2       y1      y2     z1       z2
<dbl>  <dbl>    <dbl>   <dbl>  <dbl>    <dbl>
1  1.40  -1.59   0.0458  -0.558   0.484  0.794
2  1.24   0.124 -0.0210  -1.57   -0.234 -2.30
3 -0.234 -1.93   0.804    0.845  -1.90   0.00116
4  0.549  1.12  -0.221   -0.421   0.169  1.11
5  0.633 -0.140  0.00652 -0.200   0.202  1.12
6 -0.257  0.963 -1.86    -0.208   0.237  0.544
7  0.283 -0.152  1.47     0.423   0.747  0.518
8  2.31  -1.46  -0.908    0.603  -0.506  0.850
9 -0.616  0.165  0.651   -0.0481 -1.05  -0.619
10  0.559 -1.18   0.878   -1.19   -1.91   1.02
``````

I want to calculate the row-wise `sd()` across columns `x` and `y`, but NOT `z`. The following works, but is not tidy IMHO:

``````# Equivalent to rowMeans, but for sd
rowSds = function(x){ return(apply(x, 1, sd)) }

# Calculate sd across columns, where the column name contain x or y
d %>% mutate(sd_xy = d %>% select(matches("x|y")) %>% rowSds)
``````

So I am looking for something like this, but without having to write out the variable names:

``````d %>% rowwise %>% mutate(sd_xy = c(x1, x2, y1, y2) %>% sd) %>% ungroup
``````

There must be some NSE way to do this elegantly?

I do believe there is a better way than that...

``````d %>%
select(matches("x|y")) %>%
rowwise %>%
do(data.frame(., sd_xy = sd(unlist(.))))
``````

It appears easier to use apply here (at least for me)!

Thanks for input, but still not really that tidy 1 Like

too bad! I'm probably not enough familiar with tidy meaning last attempt based on that

``````d %>% mutate(sd_xy = pmap_dbl(select(., matches("x|y")), lift_vd(sd)))
``````

thank you for the question, I'm discovering some interesting stuff

4 Likes

So here is a method that I would consider "tidy" in terms of it being all of `tidyverse` functions, but it is certainly more verbose than your current solution. However, if you define it in a function as shown then it is not too bad, IMO:

``````library('tidyverse')
set.seed(733744)
n = 10
d = tibble(x1 = rnorm(n), x2 = rnorm(n),
y1 = rnorm(n), y2 = rnorm(n),
z1 = rnorm(n), z2 = rnorm(n))

rowwise_sd <- function(data, ...){
my_cols <- enquos(...)
data %>%
group_by(row = row_number()) %>%
nest() %>%
mutate(row_sd = map(data, ~{
.x %>%
dplyr::select(!!!my_cols) %>%
gather(key = col, value = value) %>%
summarize(sd = sd(value))
})) %>%
unnest() %>%
select(-row)
}

d %>% rowwise_sd(x1, x2, y1, y2)

#> # A tibble: 10 x 7
#>        x1     x2       y1      y2     z1       z2    sd
#>     <dbl>  <dbl>    <dbl>   <dbl>  <dbl>    <dbl> <dbl>
#>  1  1.40  -1.59   0.0458  -0.558   0.484  0.794   1.25
#>  2  1.24   0.124 -0.0210  -1.57   -0.234 -2.30    1.16
#>  3 -0.234 -1.93   0.804    0.845  -1.90   0.00116 1.30
#>  4  0.549  1.12  -0.221   -0.421   0.169  1.11    0.710
#>  5  0.633 -0.140  0.00652 -0.200   0.202  1.12    0.382
#>  6 -0.257  0.963 -1.86    -0.208   0.237  0.544   1.16
#>  7  0.283 -0.152  1.47     0.423   0.747  0.518   0.688
#>  8  2.31  -1.46  -0.908    0.603  -0.506  0.850   1.69
#>  9 -0.616  0.165  0.651   -0.0481 -1.05  -0.619   0.525
#> 10  0.559 -1.18   0.878   -1.19   -1.91   1.02    1.11

d %>% rowwise_sd(matches("x|y"))
#> # A tibble: 10 x 7
#>        x1     x2       y1      y2     z1       z2    sd
#>     <dbl>  <dbl>    <dbl>   <dbl>  <dbl>    <dbl> <dbl>
#>  1  1.40  -1.59   0.0458  -0.558   0.484  0.794   1.25
#>  2  1.24   0.124 -0.0210  -1.57   -0.234 -2.30    1.16
#>  3 -0.234 -1.93   0.804    0.845  -1.90   0.00116 1.30
#>  4  0.549  1.12  -0.221   -0.421   0.169  1.11    0.710
#>  5  0.633 -0.140  0.00652 -0.200   0.202  1.12    0.382
#>  6 -0.257  0.963 -1.86    -0.208   0.237  0.544   1.16
#>  7  0.283 -0.152  1.47     0.423   0.747  0.518   0.688
#>  8  2.31  -1.46  -0.908    0.603  -0.506  0.850   1.69
#>  9 -0.616  0.165  0.651   -0.0481 -1.05  -0.619   0.525
#> 10  0.559 -1.18   0.878   -1.19   -1.91   1.02    1.11
``````

Created on 2019-01-23 by the reprex package (v0.2.0).

4 Likes

This... This is why this is an awesome forum! Thanks for the discussion @lemairev and @tbradley I'm going with @lemairev's suggestion as a solution, which I definitely think is tidy - Excellent! 1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.