Hi,
I'm trying to do the following in a tidy way.
Given this example data X
:
set.seed(365542)
X = tibble(
id = replicate(sample(LETTERS, 1), n = 100),
value = sample(1e6, 100),
correctly_paired = 1
)
Yielding:
> X
# A tibble: 100 x 3
id value correctly_paired
<chr> <int> <dbl>
1 H 717975 1
2 K 191737 1
3 T 330275 1
4 P 740035 1
5 Z 915701 1
6 N 919085 1
7 N 223331 1
8 T 838440 1
9 D 61480 1
10 W 472494 1
# … with 90 more rows
I want to iterate over each unique id
and randomly assign a value
sampled from the remaining id
s and then mutate the correctly_paired
variable to 0
.
Something along those lines would be:
X_new = tibble()
for( id_i in unique(X$id) ){
n = X %>%
filter(id == id_i) %>%
nrow
tmp = X %>%
filter(id != id_i) %>%
sample_n(size = n) %>%
pull(value)
X_new = X_new %>%
bind_rows(
tibble(
id = id_i,
value = tmp,
correctly_paired = 0
)
)
}
and then
X_new = X %>%
bind_rows(X_new)
Yielding:
> X_new %>% count(correctly_paired)
# A tibble: 2 x 2
correctly_paired n
* <dbl> <int>
1 0 100
2 1 100
Is there a clever tidy way of doing this?