Hey there. It's an interesting question. I'm not sure what your motivation is for not using {purrr::map_*
} - which seems the ideal tool for the job as you want to iterate a function over a vector. But anyway, I had a crack at doing it with {dplyr
} but my skills were not adequate despite some frustrating hours of trying!
Here are a few thoughts:
- I suspect in your original
map_dbl
example, if I understand you right, what you really want to say is:
df %>%
group_by(var1) %>%
mutate(
n_lower = map_dbl(var2, ~sum(var2[var2 < .x]))
)
... this gives a different result to your example. And in your second, non-purrr example, you've written var1 when presumably you mean var2.
-
And then I thought: you want a vector result from mutate
of length 6, to fill your $n_lower column. If you do group_by
then it will summarise and you'll get a result of length 2 (one sum for each group in var1). So what you really want is dplyr::filter
, not group_by
.
-
And the tool to meet your need of distinguishing var1
as a column name from var1
as a variable argument to sum is probably tidyeval
.
Here's my attempt, which doesn't work, however. It seems to take just the first result from row 1, which is zero, and uses it to fill every row of $n_lower. Rather than doing the mutate separately for each row of the tibble. (This is why I love {purrr
}!)
I would be grateful of enlightenment on this problem myself!
lower_sum <- function(df, var1, var2) {
df %>%
filter({{var1}} == var1) %>%
summarise(sum({{var2}}[{{var2}} < var2])) %>%
pull(.)
}
df %>%
mutate(n_lower = lower_sum(df, var1, var2))
I think I know why this doesn't work (lower_sum(df, var1, var2)
just returns a single value 0 rather than a vector) but my brain hurts too much now to fix it.
I reckon this is do-able using extra/duplicate columns but that feels like cheating.
This might be useful, or this or this or this ... I'm not sure.