Apply binomial-test for each row in a data/table

sam1 · May 31, 2019, 12:44pm

Hi!

I have some trouble to perform a binomial-test.
I've created a table (named es "Group") which looks like this:

My goal is to show the confidence intervalls for each rows (which I want to test) in a extra colum of my data "Group".

At first I was trying with this code:

Group[,"BinomTest"] <- binom.test(Group$Defaults, Group$Count,
                                  Group$PD,alternative =  "two.sided", conf.level = 0.90 )

and I failed.

On my second thought I was thinking about a function and at least tried to create it by:

bt <- function(x,n,p) {binom.test(x,n,p, alternative = c("two.sided"), conf.level = 0.90)$conf.int}

Group[,"BinomTest"] <- mapply(bt, Group)

But also failed.

Can please someone tell me how I can manage to add a extra column which shows me for every row the confidence intervall by the binomtest?

I am grateful for very help!

I am thanking you in advance

Yarnabrina · May 31, 2019, 1:29pm

Confidence interval will be two numbers, so I don't think it's possible to store in a single column as numbers. You can store them in characters though. In the following example, I stored in the form LCB - UCB. I hope you can modify this example in your preferable format.

Group <- data.frame(Rating = 1:5,
                    Count = c(266, 224, 205, 228, 211),
                    Defaults = c(20, 23, 22, 22, 30),
                    PD = c(5, 6, 9, 15, 20))

Group$BinomTest <- apply(X = Group,
                         MARGIN = 1,
                         FUN = function(t)
                         {
                           ci <- binom.test(x = t[3],
                                            n = t[2],
                                            p = (t[4] / 100),
                                            alternative = "two.sided",
                                            conf.level = 0.90)$conf.int
                           paste0(round(x = ci,
                                        digits = 4),
                                  collapse = " - ")
                         })

Group
#>   Rating Count Defaults PD       BinomTest
#> 1      1   266       20  5 0.0504 - 0.1074
#> 2      2   224       23  6 0.0712 - 0.1423
#> 3      3   205       22  9 0.0738 - 0.1497
#> 4      4   228       22 15 0.0662 - 0.1349
#> 5      5   211       30 20 0.1042 - 0.1879

^{Created on 2019-05-31 by the reprex package (v0.3.0)}

And, by the way, I stored them in raw limits. You can easily change them to percentages.

If using two columns is an option, then I'd like to add another option using development version of tidyr.

library(magrittr)

Group <- data.frame(Rating = 1:5,
                    Count = c(266, 224, 205, 228, 211),
                    Defaults = c(20, 23, 22, 22, 30),
                    PD = c(5, 6, 9, 15, 20))

Group %>%
  dplyr::mutate(CI = purrr::pmap(.l = .,
                                 .f = purrr::lift_vd(..f = function(t)
                                 {
                                   ci <- binom.test(x = t[3],
                                                    n = t[2],
                                                    p = (t[4] / 100),
                                                    alternative = "two.sided",
                                                    conf.level = 0.90)$conf.int
                                   names(x = ci) <- c("LCB", "UCB")
                                   ci
                                 }))) %>%
  tidyr::unnest_wider(CI)
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188

Here, I've used lift_vd because I don't know how to write functions with more than one step in the form ~ ..., and somehow I can't use normal functions in pmap without lift inside pmap.

Andres, can I request you to help me in this matter and improve this solution?

Edit

Here's a tidyverse solution:

library(magrittr)

Group <- data.frame(Rating = 1:5,
                    Count = c(266, 224, 205, 228, 211),
                    Defaults = c(20, 23, 22, 22, 30),
                    PD = c(5, 6, 9, 15, 20))

Group %>%
  dplyr::mutate(temp = purrr::pmap(.l = .,
                                   .f = purrr::lift_vd(..f = function(t)
                                   {
                                     with(data = binom.test(x = t[3],
                                                            n = t[2],
                                                            p = (t[4] / 100),
                                                            alternative = "two.sided",
                                                            conf.level = 0.90),
                                          expr = list(LCB = conf.int[1],
                                                      UCB = conf.int[2],
                                                      PV = p.value))
                                   }))) %>%
  tidyr::unnest_wider(col = temp)
#> # A tibble: 5 x 7
#>   Rating Count Defaults    PD    LCB   UCB     PV
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>  <dbl>
#> 1      1   266       20     5 0.0504 0.107 0.0663
#> 2      2   224       23     6 0.0712 0.142 0.0110
#> 3      3   205       22     9 0.0738 0.150 0.392 
#> 4      4   228       22    15 0.0662 0.135 0.0254
#> 5      5   211       30    20 0.104  0.188 0.0384

As to why your code fails, you code works fine for the first run, if you change t[4] to t[4]/100. If you run again, probably you'll get the same error, because then one element of the row is character because of paste0. If you skip that (it's unnecessary), it'll work perfectly. See below:

Group <- data.frame(Rating = 1:5,
                    Count = c(266, 224, 205, 228, 211),
                    Defaults = c(20, 23, 22, 22, 30),
                    PD = c(5, 6, 9, 15, 20))

function_PV <- function(t)
{
  binom.test(x = t[3],
             n = t[2],
             p = (t[4] / 100),
             alternative = "two.sided",
             conf.level = 0.90)$p.value
}

Group$pValue<- apply(X = Group, MARGIN = 1, FUN = function_PV)

Group
#>   Rating Count Defaults PD     pValue
#> 1      1   266       20  5 0.06630413
#> 2      2   224       23  6 0.01097650
#> 3      3   205       22  9 0.39154014
#> 4      4   228       22 15 0.02544170
#> 5      5   211       30 20 0.03842723

sam1 · May 31, 2019, 1:42pm

thank you very much @Yarnabrina

You helped me really a lot with this code!

It works wonderful : )

andresrcs · May 31, 2019, 2:00pm

Another option using dplyr

Group <- data.frame(Rating = 1:5,
                    Count = c(266, 224, 205, 228, 211),
                    Defaults = c(20, 23, 22, 22, 30),
                    PD = c(5, 6, 9, 15, 20))

library(dplyr)

binom_CI <- function(x, n, p, l) {
    CI <- binom.test(x, n, p/100, alternative =  "two.sided", conf.level = 0.90)$conf.int[l]
    return(CI)
}

Group %>% 
    rowwise() %>% 
    mutate(LCB = binom_CI(Defaults, Count, PD, 1),
           UCB = binom_CI(Defaults, Count, PD, 2))
#> Source: local data frame [5 x 6]
#> Groups: <by row>
#> 
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188

EDIT:
Even another one

library(tidyverse)

Group %>% 
    rowwise() %>% 
    mutate(CI = list(enframe(binom.test(Defaults, Count, PD/100, alternative =  "two.sided", conf.level = 0.90)$conf.int))) %>% 
    unnest(CI) %>% 
    spread(name, value) %>% 
    rename("LCB" = "1", "UCB" = "2")
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188

sam1 · May 31, 2019, 3:40pm

I am afraid I still have a question.

I was trying to add the p-values in a seperate column, which I tried like this:

Group$pValue<- apply(X = Group, MARGIN = 1, FUN = function_PV)

with

function_PV <- function(t){
                pv <- binom.test(x = t[3], n = t[2], p = t[4],
                      alternative = "two.sided", conf.level = 0.90)$p.value
                      paste0(x = pv)}

But I get this error message:

Error in round(x) : non-numeric argument to mathematical function

I can't tell why this happens, because I think it gathers all numeric values?`

May I ask kindly for help again please?

andresrcs · May 31, 2019, 3:42pm

Yeah, sure

library(tidyverse)

Group <- data.frame(Rating = 1:5,
                    Count = c(266, 224, 205, 228, 211),
                    Defaults = c(20, 23, 22, 22, 30),
                    PD = c(5, 6, 9, 15, 20))

Group %>%
    mutate(CI = purrr::pmap(.l = .,
                            .f = ~binom.test(x = ..3,
                                             n = ..2,
                                             p = (..4 / 100),
                                             alternative = "two.sided",
                                             conf.level = 0.90)$conf.int
    )) %>%
    unnest_wider(CI) %>% 
    rename("LCB" = ...1, "UCB" = ...2)
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188

The manual renaming still bothers me, it should be a better way

EDIT: Actually purrr:pmap() is not needed maybe it would be more readable just using mutate()

Group %>% 
    rowwise() %>% 
    mutate(CI = list(binom.test(Defaults, Count, PD/100, alternative =  "two.sided", conf.level = 0.90)$conf.int)) %>% 
    unnest_wider(CI) %>% 
    rename("LCB" = "...1", "UCB" = "...2")
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188

sam1 · June 4, 2019, 8:53pm

Hi Yarnabrina,

Thank you for your help and I am sorry for my late response, because I was very ill. However,
if I am running this code

Group <- data.frame(Rating = 1:5,
                    Count = c(266, 224, 205, 228, 211),
                    Defaults = c(20, 23, 22, 22, 30),
                    PD = c(5, 6, 9, 15, 20))


function_CL <- function(t)
{
  ci <- binom.test(x = t[3],
             n = t[2],
             p = (t[4] / 100),
             alternative = "two.sided",
             conf.level = 0.90)$conf.int
  paste0(round(x = ci,
               digits = 4),
         collapse = " - ")
  
}

function_PV <- function(t)
{
  binom.test(x = t[3],
             n = t[2],
             p = (t[4] / 100),
             alternative = "two.sided",
             conf.level = 0.90)$p.value
}


Group$CL<- apply(X = Group, MARGIN = 1, FUN = function_CL)

Group$PV<- apply(X = Group, MARGIN = 1, FUN = function_PV)

It will not work for the last statement "Group$PV". But If you run it seperatly, say you run Group$PV first an then Group$CL then it works totally fine, which I don't really understand?

Sorry for bothering.

With kind regards

Yarnabrina · June 5, 2019, 4:55am

Don't worry about bothering me or replying late. I hope you're well now, and wish you a speedy recovery.

Actually, I tried to answer this very question in my last post, but apparently I failed. I'm not good at explaining things. Let me try once again.

Suppose I run your code (in post 9) upto the penultimate line. Since function_CL uses paste, CL column is character, as paste returns a character vector.

Now, before running the lat line, Group is a data.frame, and it's okay for different columns to be of different type. But while you're using apply rowwise, last entry of each row is character and it forces all elements to become characters. So, they do not remain numeric any longer while they are being supplied to function_PV.

> Group1 <- Group
> 
> lapply(X = Group1,
+        FUN = typeof) # checking type of each column
$Rating
[1] "integer"

$Count
[1] "double"

$Defaults
[1] "double"

$PD
[1] "double"

> 
> Group1$CL<- apply(X = Group1,
+                   MARGIN = 1,
+                   FUN = function_CL)
> 
> lapply(X = Group1,
+        FUN = typeof) # checking type of each column
$Rating
[1] "integer"

$Count
[1] "double"

$Defaults
[1] "double"

$PD
[1] "double"

$CL
[1] "character"

> 
> Group1$PV<- apply(X = Group1,
+                   MARGIN = 1,
+                   FUN = function_PV)
 Error in round(x) : non-numeric argument to mathematical function > 
> apply(X = Group1,
+       MARGIN = 1,
+       FUN = typeof) # checking type of each row
[1] "character" "character" "character" "character" "character"

On the other hand, if you interchange the last two lines, this problem won't happen, as function_PV returns real numbers.

> Group2 <- Group
> 
> lapply(X = Group2,
+        FUN = typeof) # checking type of each column
$Rating
[1] "integer"

$Count
[1] "double"

$Defaults
[1] "double"

$PD
[1] "double"

> 
> Group2$PV<- apply(X = Group2,
+                   MARGIN = 1,
+                   FUN = function_PV)
> 
> lapply(X = Group2,
+        FUN = typeof) # checking type of each column
$Rating
[1] "integer"

$Count
[1] "double"

$Defaults
[1] "double"

$PD
[1] "double"

$PV
[1] "double"

> 
> apply(X = Group2,
+       MARGIN = 1,
+       FUN = typeof) # checking type of each row
[1] "double" "double" "double" "double" "double"
> 
> Group2$CL<- apply(X = Group2,
+                   MARGIN = 1,
+                   FUN = function_CL)
> 
> lapply(X = Group2,
+        FUN = typeof) # checking type of each column
$Rating
[1] "integer"

$Count
[1] "double"

$Defaults
[1] "double"

$PD
[1] "double"

$PV
[1] "double"

$CL
[1] "character"

sam1 · June 5, 2019, 7:22pm

Thank you so much for your explanation!

system · June 12, 2019, 7:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.