 Apply binomial-test for each row in a data/table

Hi!

I have some trouble to perform a binomial-test.
I've created a table (named es "Group") which looks like this: My goal is to show the confidence intervalls for each rows (which I want to test) in a extra colum of my data "Group".

At first I was trying with this code:

``````Group[,"BinomTest"] <- binom.test(Group\$Defaults, Group\$Count,
Group\$PD,alternative =  "two.sided", conf.level = 0.90 )
``````

and I failed.

On my second thought I was thinking about a function and at least tried to create it by:

``````bt <- function(x,n,p) {binom.test(x,n,p, alternative = c("two.sided"), conf.level = 0.90)\$conf.int}

Group[,"BinomTest"] <- mapply(bt, Group)
``````

But also failed.

Can please someone tell me how I can manage to add a extra column which shows me for every row the confidence intervall by the binomtest?

I am grateful for very help!

I am thanking you in advance

Confidence interval will be two numbers, so I don't think it's possible to store in a single column as numbers. You can store them in characters though. In the following example, I stored in the form `LCB - UCB`. I hope you can modify this example in your preferable format.

``````Group <- data.frame(Rating = 1:5,
Count = c(266, 224, 205, 228, 211),
Defaults = c(20, 23, 22, 22, 30),
PD = c(5, 6, 9, 15, 20))

Group\$BinomTest <- apply(X = Group,
MARGIN = 1,
FUN = function(t)
{
ci <- binom.test(x = t,
n = t,
p = (t / 100),
alternative = "two.sided",
conf.level = 0.90)\$conf.int
paste0(round(x = ci,
digits = 4),
collapse = " - ")
})

Group
#>   Rating Count Defaults PD       BinomTest
#> 1      1   266       20  5 0.0504 - 0.1074
#> 2      2   224       23  6 0.0712 - 0.1423
#> 3      3   205       22  9 0.0738 - 0.1497
#> 4      4   228       22 15 0.0662 - 0.1349
#> 5      5   211       30 20 0.1042 - 0.1879
``````

Created on 2019-05-31 by the reprex package (v0.3.0)

And, by the way, I stored them in raw limits. You can easily change them to percentages.

If using two columns is an option, then I'd like to add another option using development version of `tidyr`.

``````library(magrittr)

Group <- data.frame(Rating = 1:5,
Count = c(266, 224, 205, 228, 211),
Defaults = c(20, 23, 22, 22, 30),
PD = c(5, 6, 9, 15, 20))

Group %>%
dplyr::mutate(CI = purrr::pmap(.l = .,
.f = purrr::lift_vd(..f = function(t)
{
ci <- binom.test(x = t,
n = t,
p = (t / 100),
alternative = "two.sided",
conf.level = 0.90)\$conf.int
names(x = ci) <- c("LCB", "UCB")
ci
}))) %>%
tidyr::unnest_wider(CI)
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188
``````

Here, I've used `lift_vd` because I don't know how to write functions with more than one step in the form `~ ...`, and somehow I can't use normal functions in `pmap` without `lift` inside `pmap`.

Andres, can I request you to help me in this matter and improve this solution?

1 Like

thank you very much @Yarnabrina

You helped me really a lot with this code!

It works wonderful : )

Another option using `dplyr`

``````Group <- data.frame(Rating = 1:5,
Count = c(266, 224, 205, 228, 211),
Defaults = c(20, 23, 22, 22, 30),
PD = c(5, 6, 9, 15, 20))

library(dplyr)

binom_CI <- function(x, n, p, l) {
CI <- binom.test(x, n, p/100, alternative =  "two.sided", conf.level = 0.90)\$conf.int[l]
return(CI)
}

Group %>%
rowwise() %>%
mutate(LCB = binom_CI(Defaults, Count, PD, 1),
UCB = binom_CI(Defaults, Count, PD, 2))
#> Source: local data frame [5 x 6]
#> Groups: <by row>
#>
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188
``````

EDIT:
Even another one

``````library(tidyverse)

Group %>%
rowwise() %>%
mutate(CI = list(enframe(binom.test(Defaults, Count, PD/100, alternative =  "two.sided", conf.level = 0.90)\$conf.int))) %>%
unnest(CI) %>%
rename("LCB" = "1", "UCB" = "2")
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188
``````
1 Like

I am afraid I still have a question.

I was trying to add the p-values in a seperate column, which I tried like this:

`Group\$pValue<- apply(X = Group, MARGIN = 1, FUN = function_PV)`

with

``````function_PV <- function(t){
pv <- binom.test(x = t, n = t, p = t,
alternative = "two.sided", conf.level = 0.90)\$p.value
paste0(x = pv)}
``````

But I get this error message:

Error in round(x) : non-numeric argument to mathematical function

I can't tell why this happens, because I think it gathers all numeric values?`

Yeah, sure

``````library(tidyverse)

Group <- data.frame(Rating = 1:5,
Count = c(266, 224, 205, 228, 211),
Defaults = c(20, 23, 22, 22, 30),
PD = c(5, 6, 9, 15, 20))

Group %>%
mutate(CI = purrr::pmap(.l = .,
.f = ~binom.test(x = ..3,
n = ..2,
p = (..4 / 100),
alternative = "two.sided",
conf.level = 0.90)\$conf.int
)) %>%
unnest_wider(CI) %>%
rename("LCB" = ...1, "UCB" = ...2)
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188
``````

The manual renaming still bothers me, it should be a better way

EDIT: Actually `purrr:pmap()` is not needed maybe it would be more readable just using `mutate()`

``````Group %>%
rowwise() %>%
mutate(CI = list(binom.test(Defaults, Count, PD/100, alternative =  "two.sided", conf.level = 0.90)\$conf.int)) %>%
unnest_wider(CI) %>%
rename("LCB" = "...1", "UCB" = "...2")
#> # A tibble: 5 x 6
#>   Rating Count Defaults    PD    LCB   UCB
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>
#> 1      1   266       20     5 0.0504 0.107
#> 2      2   224       23     6 0.0712 0.142
#> 3      3   205       22     9 0.0738 0.150
#> 4      4   228       22    15 0.0662 0.135
#> 5      5   211       30    20 0.104  0.188
``````

Here's a tidyverse solution:

``````library(magrittr)

Group <- data.frame(Rating = 1:5,
Count = c(266, 224, 205, 228, 211),
Defaults = c(20, 23, 22, 22, 30),
PD = c(5, 6, 9, 15, 20))

Group %>%
dplyr::mutate(temp = purrr::pmap(.l = .,
.f = purrr::lift_vd(..f = function(t)
{
with(data = binom.test(x = t,
n = t,
p = (t / 100),
alternative = "two.sided",
conf.level = 0.90),
expr = list(LCB = conf.int,
UCB = conf.int,
PV = p.value))
}))) %>%
tidyr::unnest_wider(col = temp)
#> # A tibble: 5 x 7
#>   Rating Count Defaults    PD    LCB   UCB     PV
#>    <int> <dbl>    <dbl> <dbl>  <dbl> <dbl>  <dbl>
#> 1      1   266       20     5 0.0504 0.107 0.0663
#> 2      2   224       23     6 0.0712 0.142 0.0110
#> 3      3   205       22     9 0.0738 0.150 0.392
#> 4      4   228       22    15 0.0662 0.135 0.0254
#> 5      5   211       30    20 0.104  0.188 0.0384
``````

As to why your code fails, you code works fine for the first run, if you change `t` to `t/100`. If you run again, probably you'll get the same error, because then one element of the row is `character` because of `paste0`. If you skip that (it's unnecessary), it'll work perfectly. See below:

``````Group <- data.frame(Rating = 1:5,
Count = c(266, 224, 205, 228, 211),
Defaults = c(20, 23, 22, 22, 30),
PD = c(5, 6, 9, 15, 20))

function_PV <- function(t)
{
binom.test(x = t,
n = t,
p = (t / 100),
alternative = "two.sided",
conf.level = 0.90)\$p.value
}

Group\$pValue<- apply(X = Group, MARGIN = 1, FUN = function_PV)

Group
#>   Rating Count Defaults PD     pValue
#> 1      1   266       20  5 0.06630413
#> 2      2   224       23  6 0.01097650
#> 3      3   205       22  9 0.39154014
#> 4      4   228       22 15 0.02544170
#> 5      5   211       30 20 0.03842723
``````

Hi Yarnabrina,

Thank you for your help and I am sorry for my late response, because I was very ill. However,
if I am running this code

``````Group <- data.frame(Rating = 1:5,
Count = c(266, 224, 205, 228, 211),
Defaults = c(20, 23, 22, 22, 30),
PD = c(5, 6, 9, 15, 20))

function_CL <- function(t)
{
ci <- binom.test(x = t,
n = t,
p = (t / 100),
alternative = "two.sided",
conf.level = 0.90)\$conf.int
paste0(round(x = ci,
digits = 4),
collapse = " - ")

}

function_PV <- function(t)
{
binom.test(x = t,
n = t,
p = (t / 100),
alternative = "two.sided",
conf.level = 0.90)\$p.value
}

Group\$CL<- apply(X = Group, MARGIN = 1, FUN = function_CL)

Group\$PV<- apply(X = Group, MARGIN = 1, FUN = function_PV)
``````

It will not work for the last statement "Group\$PV". But If you run it seperatly, say you run Group\$PV first an then Group\$CL then it works totally fine, which I don't really understand?

Sorry for bothering.

With kind regards

Don't worry about bothering me or replying late. I hope you're well now, and wish you a speedy recovery.

Actually, I tried to answer this very question in my last post, but apparently I failed. I'm not good at explaining things. Let me try once again.

Suppose I run your code (in post 9) upto the penultimate line. Since `function_CL` uses `paste`, `CL` column is `character`, as `paste` returns a character vector.

Now, before running the lat line, `Group` is a `data.frame`, and it's okay for different columns to be of different type. But while you're using `apply` rowwise, last entry of each row is `character` and it forces all elements to become characters. So, they do not remain `numeric` any longer while they are being supplied to `function_PV`.

``````> Group1 <- Group
>
> lapply(X = Group1,
+        FUN = typeof) # checking type of each column
\$Rating
 "integer"

\$Count
 "double"

\$Defaults
 "double"

\$PD
 "double"

>
> Group1\$CL<- apply(X = Group1,
+                   MARGIN = 1,
+                   FUN = function_CL)
>
> lapply(X = Group1,
+        FUN = typeof) # checking type of each column
\$Rating
 "integer"

\$Count
 "double"

\$Defaults
 "double"

\$PD
 "double"

\$CL
 "character"

>
> Group1\$PV<- apply(X = Group1,
+                   MARGIN = 1,
+                   FUN = function_PV)
Error in round(x) : non-numeric argument to mathematical function >
> apply(X = Group1,
+       MARGIN = 1,
+       FUN = typeof) # checking type of each row
 "character" "character" "character" "character" "character"
``````

On the other hand, if you interchange the last two lines, this problem won't happen, as `function_PV` returns real numbers.

``````> Group2 <- Group
>
> lapply(X = Group2,
+        FUN = typeof) # checking type of each column
\$Rating
 "integer"

\$Count
 "double"

\$Defaults
 "double"

\$PD
 "double"

>
> Group2\$PV<- apply(X = Group2,
+                   MARGIN = 1,
+                   FUN = function_PV)
>
> lapply(X = Group2,
+        FUN = typeof) # checking type of each column
\$Rating
 "integer"

\$Count
 "double"

\$Defaults
 "double"

\$PD
 "double"

\$PV
 "double"

>
> apply(X = Group2,
+       MARGIN = 1,
+       FUN = typeof) # checking type of each row
 "double" "double" "double" "double" "double"
>
> Group2\$CL<- apply(X = Group2,
+                   MARGIN = 1,
+                   FUN = function_CL)
>
> lapply(X = Group2,
+        FUN = typeof) # checking type of each column
\$Rating
 "integer"

\$Count
 "double"

\$Defaults
 "double"

\$PD
 "double"

\$PV
 "double"

\$CL
 "character"
``````
1 Like

Thank you so much for your explanation!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.