Complicated: Iterate over groups leaving one out using dplyr

Leon · February 26, 2021, 2:28pm

Hi,

I'm trying to do the following in a tidy way.

Given this example data X:

set.seed(365542)
X = tibble(
  id = replicate(sample(LETTERS, 1), n = 100),
  value = sample(1e6, 100),
  correctly_paired = 1
)

Yielding:

> X
# A tibble: 100 x 3
   id     value correctly_paired
   <chr>  <int>            <dbl>
 1 H     717975                1
 2 K     191737                1
 3 T     330275                1
 4 P     740035                1
 5 Z     915701                1
 6 N     919085                1
 7 N     223331                1
 8 T     838440                1
 9 D      61480                1
10 W     472494                1
# … with 90 more rows

I want to iterate over each unique id and randomly assign a value sampled from the remaining ids and then mutate the correctly_paired variable to 0.

Something along those lines would be:

X_new = tibble()
for( id_i in unique(X$id) ){
  n = X %>% 
    filter(id == id_i) %>% 
    nrow
  tmp = X %>% 
    filter(id != id_i) %>% 
    sample_n(size = n) %>% 
    pull(value)
  X_new = X_new %>%
    bind_rows(
      tibble(
        id = id_i,
        value = tmp,
        correctly_paired = 0
      )
    )
}

and then

X_new = X %>% 
  bind_rows(X_new)

Yielding:

> X_new %>% count(correctly_paired)
# A tibble: 2 x 2
  correctly_paired     n
*            <dbl> <int>
1                0   100
2                1   100

Is there a clever tidy way of doing this?

emilmalta · February 26, 2021, 5:27pm

You could use a combination of map2 and filter, to make a list column of tibbles that excludes the group.

X %>% 
  mutate(data = map2(list(.), id,
    ~ .x %>% filter(id != .y) 
    )
  )

You can then sample from the group using map2 and sample_n.

X %>% 
  mutate(data = map2(list(.), id,
    ~ .x %>% filter(id != .y) %>% mutate(correctly_paired = 0)
    )
  ) %>% 
  group_by(id, data) %>% 
  tally() %>% 
  mutate(sampled_data = map2(data, n, sample_n)) %>% 
  pull(sampled_data) %>% 
  bind_rows(X)

This might not be the best way of doing it, but it's the tidiest I can come up with.

Leon · February 28, 2021, 2:56pm

Cheers - After thinking about this, I think it's probably better done, using a custom function

system · March 21, 2021, 2:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.