Hi!
I've a data frame with n
rows, and I want to apply a function on all possible combinations of k
rows of this data frame.
If I can create another data frame which has \binom{n}{k} rows corresponding to the all possible row combinations,then I can simply use my function using apply
on the rows.
My question is how to create such a data frame.
Based on this answer on SO, I can do this using base R. (provided below)
But I'm trying to learn tidyverse and hence I wonder whether there's a way to do this in tidyverse.
I can find the relevant row numbers from the output of tidyr::crossing(., .)
[which creates all possible pairs] and extract only those. But I suppose there's a better way and at least for me, the pattern of the row indices is not that obvious. For k = 2
, it's pretty easy. But I fail to find patterns for higher values of k
.
Any suggestions will be appreciated.
# example dataset
input <- data.frame(stringsAsFactors = FALSE,
s = letters[1:6],
C = LETTERS[1:6])
input
#> s C
#> 1 a A
#> 2 b B
#> 3 c C
#> 4 d D
#> 5 e E
#> 6 f F
# for example
k <- 3
# all possible combinations of the row indices of size k
combinations <- combn(x = seq_len(length.out = nrow(x = input)),
m = k)
# what I want
expected_output <- data.frame(stringsAsFactors = FALSE,
s1 = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b",
"b", "b", "b", "b", "c", "c", "c", "d"),
C1 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B", "C", "C", "C", "D"),
s2 = c("b", "b", "b", "b", "c", "c", "c", "d", "d", "e", "c", "c",
"c", "d", "d", "e", "d", "d", "e", "e"),
C2 = c("B", "B", "B", "B", "C", "C", "C", "D", "D", "E", "C", "C",
"C", "D", "D", "E", "D", "D", "E", "E"),
s3 = c("c", "d", "e", "f", "d", "e", "f", "e", "f", "f", "d", "e",
"f", "e", "f", "f", "e", "f", "f", "f"),
C3 = c("C", "D", "E", "F", "D", "E", "F", "E", "F", "F", "D", "E",
"F", "E", "F", "F", "E", "F", "F", "F"))
expected_output
#> s1 C1 s2 C2 s3 C3
#> 1 a A b B c C
#> 2 a A b B d D
#> 3 a A b B e E
#> 4 a A b B f F
#> 5 a A c C d D
#> 6 a A c C e E
#> 7 a A c C f F
#> 8 a A d D e E
#> 9 a A d D f F
#> 10 a A e E f F
#> 11 b B c C d D
#> 12 b B c C e E
#> 13 b B c C f F
#> 14 b B d D e E
#> 15 b B d D f F
#> 16 b B e E f F
#> 17 c C d D e E
#> 18 c C d D f F
#> 19 c C e E f F
#> 20 d D e E f F
# two ways via base R
# I need to reorder the columns
output_1 <- as.data.frame(x = t(x = apply(X = combinations,
MARGIN = 2,
FUN = function(counter)
{
return(unlist(x = input[counter, ]))
})))
output_1
#> s1 s2 s3 C1 C2 C3
#> 1 a b c A B C
#> 2 a b d A B D
#> 3 a b e A B E
#> 4 a b f A B F
#> 5 a c d A C D
#> 6 a c e A C E
#> 7 a c f A C F
#> 8 a d e A D E
#> 9 a d f A D F
#> 10 a e f A E F
#> 11 b c d B C D
#> 12 b c e B C E
#> 13 b c f B C F
#> 14 b d e B D E
#> 15 b d f B D F
#> 16 b e f B E F
#> 17 c d e C D E
#> 18 c d f C D F
#> 19 c e f C E F
#> 20 d e f D E F
# I need to rename the columns
output_2 <- as.data.frame(x = t(x = apply(X = combinations,
MARGIN = 2,
FUN = function(counter)
{
rbind(sapply(X = counter,
FUN = function(t)
{
unlist(x = input[t, ])
}))
})))
output_2
#> V1 V2 V3 V4 V5 V6
#> 1 a A b B c C
#> 2 a A b B d D
#> 3 a A b B e E
#> 4 a A b B f F
#> 5 a A c C d D
#> 6 a A c C e E
#> 7 a A c C f F
#> 8 a A d D e E
#> 9 a A d D f F
#> 10 a A e E f F
#> 11 b B c C d D
#> 12 b B c C e E
#> 13 b B c C f F
#> 14 b B d D e E
#> 15 b B d D f F
#> 16 b B e E f F
#> 17 c C d D e E
#> 18 c C d D f F
#> 19 c C e E f F
#> 20 d D e E f F
Created on 2019-03-23 by the reprex package (v0.2.1)