calculating the total number of unique combinations

Hi,

This is my first post here. I'm sure this is a math problem more than an R problem, but here goes.

I'm trying to figure out two things: a calculation, and a way of getting this into a table.

The calculation is the following: if I have a total of 27 tests, and each test can have two outcomes A and B, how many unique combinations are there? So ranging from 27 A & no B (1 unique outcome); 26 A and 1 B (27 unique outcomes, one for each column); etc.; until 0 A & 27 B at the other end.

I have no idea how to even calculate this. I'm not good at maths.

I tried to create a table that would contain all these unique possibilities (that's another way of finding out the total), and I came up with this n00btastic inelegant monstrosity:

expand.grid(test01=c(0,1), test02=c(0,1), test03=c(0,1), test04=c(0,1), test05=c(0,1), test06=c(0,1), test07=c(0,1), test08=c(0,1), test09=c(0,1), test10=c(0,1), test11=c(0,1), test12=c(0,1), test13=c(0,1), test14=c(0,1), test15=c(0,1), test16=c(0,1), test17=c(0,1), test18=c(0,1), test19=c(0,1), test20=c(0,1), test21=c(0,1), test22=c(0,1), test23=c(0,1), test24=c(0,1), test25=c(0,1), test26=c(0,1), test27=c(0,1))

Unfortunately, this gives me an error message: "Error: cannot allocate vector of size 1024.0 Mb"

I tried this with only 5, 15 and 20 tests, and it worked fine for those numbers. So I guess using all 27 tests causes a buffer overflow or something.

Is there a way around this vector size limit? Would the function choose or lchoose help?

This is a combination of 27 elements taken 2 at a time, which produces 351 unique results. This is distinct from a permutation of 27 elements, which take into account order. That's equal here to 27^2 = 729.

combn(27,2)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]    1    1    1    1    1    1    1    1    1     1     1     1     1     1
#> [2,]    2    3    4    5    6    7    8    9   10    11    12    13    14    15
#>      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
#> [1,]     1     1     1     1     1     1     1     1     1     1     1     1
#> [2,]    16    17    18    19    20    21    22    23    24    25    26    27
#>      [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
#> [1,]     2     2     2     2     2     2     2     2     2     2     2     2
#> [2,]     3     4     5     6     7     8     9    10    11    12    13    14
#>      [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
#> [1,]     2     2     2     2     2     2     2     2     2     2     2     2
#> [2,]    15    16    17    18    19    20    21    22    23    24    25    26
#>      [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62]
#> [1,]     2     3     3     3     3     3     3     3     3     3     3     3
#> [2,]    27     4     5     6     7     8     9    10    11    12    13    14
#>      [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74]
#> [1,]     3     3     3     3     3     3     3     3     3     3     3     3
#> [2,]    15    16    17    18    19    20    21    22    23    24    25    26
#>      [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86]
#> [1,]     3     4     4     4     4     4     4     4     4     4     4     4
#> [2,]    27     5     6     7     8     9    10    11    12    13    14    15
#>      [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98]
#> [1,]     4     4     4     4     4     4     4     4     4     4     4     4
#> [2,]    16    17    18    19    20    21    22    23    24    25    26    27
#>      [,99] [,100] [,101] [,102] [,103] [,104] [,105] [,106] [,107] [,108]
#> [1,]     5      5      5      5      5      5      5      5      5      5
#> [2,]     6      7      8      9     10     11     12     13     14     15
#>      [,109] [,110] [,111] [,112] [,113] [,114] [,115] [,116] [,117] [,118]
#> [1,]      5      5      5      5      5      5      5      5      5      5
#> [2,]     16     17     18     19     20     21     22     23     24     25
#>      [,119] [,120] [,121] [,122] [,123] [,124] [,125] [,126] [,127] [,128]
#> [1,]      5      5      6      6      6      6      6      6      6      6
#> [2,]     26     27      7      8      9     10     11     12     13     14
#>      [,129] [,130] [,131] [,132] [,133] [,134] [,135] [,136] [,137] [,138]
#> [1,]      6      6      6      6      6      6      6      6      6      6
#> [2,]     15     16     17     18     19     20     21     22     23     24
#>      [,139] [,140] [,141] [,142] [,143] [,144] [,145] [,146] [,147] [,148]
#> [1,]      6      6      6      7      7      7      7      7      7      7
#> [2,]     25     26     27      8      9     10     11     12     13     14
#>      [,149] [,150] [,151] [,152] [,153] [,154] [,155] [,156] [,157] [,158]
#> [1,]      7      7      7      7      7      7      7      7      7      7
#> [2,]     15     16     17     18     19     20     21     22     23     24
#>      [,159] [,160] [,161] [,162] [,163] [,164] [,165] [,166] [,167] [,168]
#> [1,]      7      7      7      8      8      8      8      8      8      8
#> [2,]     25     26     27      9     10     11     12     13     14     15
#>      [,169] [,170] [,171] [,172] [,173] [,174] [,175] [,176] [,177] [,178]
#> [1,]      8      8      8      8      8      8      8      8      8      8
#> [2,]     16     17     18     19     20     21     22     23     24     25
#>      [,179] [,180] [,181] [,182] [,183] [,184] [,185] [,186] [,187] [,188]
#> [1,]      8      8      9      9      9      9      9      9      9      9
#> [2,]     26     27     10     11     12     13     14     15     16     17
#>      [,189] [,190] [,191] [,192] [,193] [,194] [,195] [,196] [,197] [,198]
#> [1,]      9      9      9      9      9      9      9      9      9      9
#> [2,]     18     19     20     21     22     23     24     25     26     27
#>      [,199] [,200] [,201] [,202] [,203] [,204] [,205] [,206] [,207] [,208]
#> [1,]     10     10     10     10     10     10     10     10     10     10
#> [2,]     11     12     13     14     15     16     17     18     19     20
#>      [,209] [,210] [,211] [,212] [,213] [,214] [,215] [,216] [,217] [,218]
#> [1,]     10     10     10     10     10     10     10     11     11     11
#> [2,]     21     22     23     24     25     26     27     12     13     14
#>      [,219] [,220] [,221] [,222] [,223] [,224] [,225] [,226] [,227] [,228]
#> [1,]     11     11     11     11     11     11     11     11     11     11
#> [2,]     15     16     17     18     19     20     21     22     23     24
#>      [,229] [,230] [,231] [,232] [,233] [,234] [,235] [,236] [,237] [,238]
#> [1,]     11     11     11     12     12     12     12     12     12     12
#> [2,]     25     26     27     13     14     15     16     17     18     19
#>      [,239] [,240] [,241] [,242] [,243] [,244] [,245] [,246] [,247] [,248]
#> [1,]     12     12     12     12     12     12     12     12     13     13
#> [2,]     20     21     22     23     24     25     26     27     14     15
#>      [,249] [,250] [,251] [,252] [,253] [,254] [,255] [,256] [,257] [,258]
#> [1,]     13     13     13     13     13     13     13     13     13     13
#> [2,]     16     17     18     19     20     21     22     23     24     25
#>      [,259] [,260] [,261] [,262] [,263] [,264] [,265] [,266] [,267] [,268]
#> [1,]     13     13     14     14     14     14     14     14     14     14
#> [2,]     26     27     15     16     17     18     19     20     21     22
#>      [,269] [,270] [,271] [,272] [,273] [,274] [,275] [,276] [,277] [,278]
#> [1,]     14     14     14     14     14     15     15     15     15     15
#> [2,]     23     24     25     26     27     16     17     18     19     20
#>      [,279] [,280] [,281] [,282] [,283] [,284] [,285] [,286] [,287] [,288]
#> [1,]     15     15     15     15     15     15     15     16     16     16
#> [2,]     21     22     23     24     25     26     27     17     18     19
#>      [,289] [,290] [,291] [,292] [,293] [,294] [,295] [,296] [,297] [,298]
#> [1,]     16     16     16     16     16     16     16     16     17     17
#> [2,]     20     21     22     23     24     25     26     27     18     19
#>      [,299] [,300] [,301] [,302] [,303] [,304] [,305] [,306] [,307] [,308]
#> [1,]     17     17     17     17     17     17     17     17     18     18
#> [2,]     20     21     22     23     24     25     26     27     19     20
#>      [,309] [,310] [,311] [,312] [,313] [,314] [,315] [,316] [,317] [,318]
#> [1,]     18     18     18     18     18     18     18     19     19     19
#> [2,]     21     22     23     24     25     26     27     20     21     22
#>      [,319] [,320] [,321] [,322] [,323] [,324] [,325] [,326] [,327] [,328]
#> [1,]     19     19     19     19     19     20     20     20     20     20
#> [2,]     23     24     25     26     27     21     22     23     24     25
#>      [,329] [,330] [,331] [,332] [,333] [,334] [,335] [,336] [,337] [,338]
#> [1,]     20     20     21     21     21     21     21     21     22     22
#> [2,]     26     27     22     23     24     25     26     27     23     24
#>      [,339] [,340] [,341] [,342] [,343] [,344] [,345] [,346] [,347] [,348]
#> [1,]     22     22     22     23     23     23     23     24     24     24
#> [2,]     25     26     27     24     25     26     27     25     26     27
#>      [,349] [,350] [,351]
#> [1,]     25     25     26
#> [2,]     26     27     27

Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

Here x consists of 27 tests each with 2 possible outcomes. In practice, this would be represented by a data frame in which test subjects were rows, the 27 tests columns, and the entries either A or B. And y to be determined is the maximum number of different outcomes for the data frame results, which will be no greater than the number of test subjects—nrow(the_data_frame) or the return value of a function f that returns an object with all the possible outcomes, whichever is smaller.

In other words, if there are only 30 different subjects, there can be, at most, 30 different outcomes, but if there are 1,000, there can be as many as 351.

If you had to write f yourself, it might look like the available-for-free combn

function (x, m, FUN = NULL, simplify = TRUE, ...) 
{
    stopifnot(length(m) == 1L, is.numeric(m))
    if (m < 0) 
        stop("m < 0", domain = NA)
    if (is.numeric(x) && length(x) == 1L && x > 0 && trunc(x) == 
        x) 
        x <- seq_len(x)
    n <- length(x)
    if (n < m) 
        stop("n < m", domain = NA)
    x0 <- x
    if (simplify) {
        if (is.factor(x)) 
            x <- as.integer(x)
    }
    m <- as.integer(m)
    e <- 0
    h <- m
    a <- seq_len(m)
    nofun <- is.null(FUN)
    if (!nofun && !is.function(FUN)) 
        stop("'FUN' must be a function or NULL")
    len.r <- length(r <- if (nofun) x[a] else FUN(x[a], ...))
    count <- as.integer(round(choose(n, m)))
    if (simplify) {
        dim.use <- if (nofun) 
            c(m, count)
        else {
            d <- dim(r)
            if (length(d) > 1L) 
                c(d, count)
            else if (len.r != 1L) 
                c(len.r, count)
            else c(d, count)
        }
    }
    if (simplify) 
        out <- matrix(r, nrow = len.r, ncol = count)
    else {
        out <- vector("list", count)
        out[[1L]] <- r
    }
    if (m > 0) {
        i <- 2L
        nmmp1 <- n - m + 1L
        while (a[1L] != nmmp1) {
            if (e < n - h) {
                h <- 1L
                e <- a[m]
                j <- 1L
            }
            else {
                e <- a[m - h]
                h <- h + 1L
                j <- 1L:h
            }
            a[m - h + j] <- e + j
            r <- if (nofun) 
                x[a]
            else FUN(x[a], ...)
            if (simplify) 
                out[, i] <- r
            else out[[i]] <- r
            i <- i + 1L
        }
    }
    if (simplify) {
        if (is.factor(x0)) {
            levels(out) <- levels(x0)
            class(out) <- class(x0)
        }
        dim(out) <- dim.use
    }
    out
}

But the whole point of working in R is that the user doesn't have to re-invent this wheel, just find it. A good place to start is rseek.org, a R-tuned front end to Google. Try searching for "combinations" there, for example.

This is an illuminating way of thinking about things. Many thanks!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.