A purrr-less way of doing this?

I'm wondering what would be a "purrr-less" but still tidyverse approach to doing the following:

library(tidyverse)

df <- tribble(
  ~var1, ~var2,
  'a', 1,
  'a', 2,
  'a', 3,
  'b', 1,
  'b', 1,
  'b', 3
)

df %>% 
  group_by(var1) %>%
  mutate(
      n_lower = map_dbl(var2, ~sum(var2 < .x))
  )

My feeling is that it should be possible to do something similar to:

df %>% 
  group_by(var1) %>%
  mutate(
      n_lower = sum(var1 < var1)
  )

... by telling R in some way that the first var1 refers to the whole group of observations rather than the single one.

Any ideas?

Hey there. It's an interesting question. I'm not sure what your motivation is for not using {purrr::map_*} - which seems the ideal tool for the job as you want to iterate a function over a vector. But anyway, I had a crack at doing it with {dplyr} but my skills were not adequate despite some frustrating hours of trying!

Here are a few thoughts:

  • I suspect in your original map_dbl example, if I understand you right, what you really want to say is:
df %>% 
  group_by(var1) %>%
  mutate(
    n_lower = map_dbl(var2, ~sum(var2[var2 < .x]))
  )

... this gives a different result to your example. And in your second, non-purrr example, you've written var1 when presumably you mean var2.


  • And then I thought: you want a vector result from mutate of length 6, to fill your $n_lower column. If you do group_by then it will summarise and you'll get a result of length 2 (one sum for each group in var1). So what you really want is dplyr::filter, not group_by.

  • And the tool to meet your need of distinguishing var1 as a column name from var1 as a variable argument to sum is probably tidyeval.


Here's my attempt, which doesn't work, however. It seems to take just the first result from row 1, which is zero, and uses it to fill every row of $n_lower. Rather than doing the mutate separately for each row of the tibble. (This is why I love {purrr}!)
I would be grateful of enlightenment on this problem myself!

lower_sum <- function(df, var1, var2) {
    df %>% 
    filter({{var1}} == var1) %>% 
    summarise(sum({{var2}}[{{var2}} < var2])) %>% 
    pull(.)
}

df %>% 
  mutate(n_lower = lower_sum(df, var1, var2))

I think I know why this doesn't work (lower_sum(df, var1, var2) just returns a single value 0 rather than a vector) but my brain hurts too much now to fix it.

I reckon this is do-able using extra/duplicate columns but that feels like cheating.

This might be useful, or this or this or this ... I'm not sure.

df %>% group_by(var1) %>%
  mutate(var2_c = list(var2)) %>% 
   rowwise() %>%
  mutate(n_lower = sum(var2>var2_c))
1 Like

Thanks both!

@ francisbarton There's no reason why I would want to not use purrr, was just wondering if/how it could be done without.

@ nirgrahamuk Interesting, thanks!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

This doesn't do what I want either:

lower_sum <- function(df, var1, var2) {
    df %>% 
    filter({{var1}} == var1) %>% 
    filter({{var2}} < var2) %>% 
    add_tally(wt = {{var2}}, name = "n_lower")
}

lower_sum(df, var1, var2)