 # A purrr-less way of doing this?

I'm wondering what would be a "purrr-less" but still tidyverse approach to doing the following:

``````library(tidyverse)

df <- tribble(
~var1, ~var2,
'a', 1,
'a', 2,
'a', 3,
'b', 1,
'b', 1,
'b', 3
)

df %>%
group_by(var1) %>%
mutate(
n_lower = map_dbl(var2, ~sum(var2 < .x))
)
``````

My feeling is that it should be possible to do something similar to:

``````df %>%
group_by(var1) %>%
mutate(
n_lower = sum(var1 < var1)
)
``````

... by telling R in some way that the first `var1` refers to the whole group of observations rather than the single one.

Any ideas?

Hey there. It's an interesting question. I'm not sure what your motivation is for not using {`purrr::map_*`} - which seems the ideal tool for the job as you want to iterate a function over a vector. But anyway, I had a crack at doing it with {`dplyr`} but my skills were not adequate despite some frustrating hours of trying!

Here are a few thoughts:

• I suspect in your original `map_dbl` example, if I understand you right, what you really want to say is:
``````df %>%
group_by(var1) %>%
mutate(
n_lower = map_dbl(var2, ~sum(var2[var2 < .x]))
)
``````

... this gives a different result to your example. And in your second, non-purrr example, you've written var1 when presumably you mean var2.

• And then I thought: you want a vector result from `mutate` of length 6, to fill your \$n_lower column. If you do `group_by` then it will summarise and you'll get a result of length 2 (one sum for each group in var1). So what you really want is `dplyr::filter`, not `group_by`.

• And the tool to meet your need of distinguishing `var1` as a column name from `var1` as a variable argument to sum is probably `tidyeval`.

Here's my attempt, which doesn't work, however. It seems to take just the first result from row 1, which is zero, and uses it to fill every row of \$n_lower. Rather than doing the mutate separately for each row of the tibble. (This is why I love {`purrr`}!)
I would be grateful of enlightenment on this problem myself!

``````lower_sum <- function(df, var1, var2) {
df %>%
filter({{var1}} == var1) %>%
summarise(sum({{var2}}[{{var2}} < var2])) %>%
pull(.)
}

df %>%
mutate(n_lower = lower_sum(df, var1, var2))
``````

I think I know why this doesn't work (`lower_sum(df, var1, var2)` just returns a single value 0 rather than a vector) but my brain hurts too much now to fix it.

I reckon this is do-able using extra/duplicate columns but that feels like cheating.

This might be useful, or this or this or this ... I'm not sure.

This doesn't do what I want either:

``````lower_sum <- function(df, var1, var2) {
df %>%
filter({{var1}} == var1) %>%
filter({{var2}} < var2) %>%
add_tally(wt = {{var2}}, name = "n_lower")
}

lower_sum(df, var1, var2)
``````
``````df %>% group_by(var1) %>%
mutate(var2_c = list(var2)) %>%
rowwise() %>%
mutate(n_lower = sum(var2>var2_c))``````
1 Like

Thanks both!

@ francisbarton There's no reason why I would want to not use purrr, was just wondering if/how it could be done without.

@ nirgrahamuk Interesting, thanks!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.