Question about probability in R

Andrzej · September 13, 2022, 5:47am

I would like to learn how to do such puzzles in R ?
And this is from here:

https://www.youtube.com/watch?v=8idr1WZ1A7Q&list=PLnjAqe8SsGpR2XVjG0Zf7UZ5-ABwhFPnV
Basically about obtaining 48 positive reviews out of 52 with probability = 0.95 and simulate it 10000 times.
How to do it in R ?
I have found this from SO and my intuition tells me that I have to somehow adapt this, am I right ?
https://stackoverflow.com/questions/64822613/probability-of-x-elements-chosen-from-the-same-group

nirgrahamuk · September 13, 2022, 9:07am

choose(50,48)*.95^48*(1-.95)^2

sidenote: There is no simulation involved in this; simulation would be an alternative to this analysis.

A function could be written to generalise

p_given_s <- function(pos,neg,s){
  choose(pos + neg,pos) *(s^pos)*((1-s)^neg)
}

p_given_s(48,2,.95)

FactOREO · September 13, 2022, 9:20am

Hello,

what you have inserted is just a usual binomial distribution and not a typical conditional distribution. So your given formula is just the probability of drawing 48 successes out of 50 trials, with a probability of success equal to 0.95.

In this case, suppose a random variable X with X \sim Bin(50,0.95). Then, you would like to know the probability of P(X = 48). This is nothing else than the probability density function of X at the point X = 48 (e.g. f_X(48)), which can be calculated in R with the command dbinom():

dbinom(48,50,.95)
#> [1] 0.2611014

^{Created on 2022-09-13 by the reprex package (v2.0.1)}

As @nirgrahamuk mentioned, here is no simulation involved (and not needed), since it is straight up stochastics. If you want to know more about the Binomial distribution, you may want to visit Binomial distribution - Wikipedia.

Kind regards

Andrzej · September 13, 2022, 10:36am

Thank you both, @nirgrahamuk for custom function and @FactOREO for explanation.

I have got a few additional questions if I may:
dbinom is a simple solution here, the result is the same as on the screenshot in my first post.

This is exactly what I was looking for watching that YT video. How do I know that I should use dbinom ? What could point me to use it in first place ?

Reading about dbinom() from help or:
https://www.rdocumentation.org/packages/stats/versions/3.3/topics/Binomial

I can see it now, that there are "x" argument and "size" argument.

How do I know that x here should be 48 ?
What does it mean size=50 in the context of that total number of reviews (sample size) is 50 ?

Is a "size" a number of trials ?

For example, we have 100 students throwing a coin 5 times (each student), what is the probability of getting heads ?
Or we have 5 student throwing coins 100 times (each student), what is the probability of getting tails ?
Is a "size" argument a number of a single throw of a coin ?
Is a "n" a number of students ?

This is a bit complicated and with rbinom we have got additionally n and p arguments.

I was thinking that this is not very intuitive naming but rather confusing:

https://stackoverflow.com/questions/65809470/parameters-of-rbinom-in-r

https://stackoverflow.com/questions/31643165/r-rbinom-what-does-the-probability-of-success-define-if-there-is-n-number-of

Is it a more intuitive explanation how to simpli understand n, size, p arguments ?

FactOREO · September 13, 2022, 10:53am

Your original question is about the basics of probability theory/statistics. The function used here, dbinom, stands for d istribution function of the binom ial distribution. Likewise there are qbinom (quantile function or generalized inverse), rbinom (random draws from specified binomial distribution) and pbinom(the value of the cumulative distribution function or sometimes called probability mass function).

So you basically have to understand the basics of statistics, like discrete and continuous distributions, what kind of distribution functions do exist and what are different types of distributions. If you know about different distributions, you will be able to come up with solutions for the questions below (in this case knowledge about binomial distribution).

Good luck on your further adventure throughout the world of statistics and probabilites

Andrzej · September 13, 2022, 11:23am

I know these functions and their variants for specific distributions. I still need an answer to my previous post, please and for that I will be very grateful.

FactOREO · September 13, 2022, 11:46am

I think you refer to the questions about the arguments, rather than your ambigous example questions. Regarding dbinom(), size means the number of trials (since in a binomial setting, you have n trials) and x refers to the number of successful trials. The argument prob means the probability of success. In your initial question, you basically asked for "What is the probability of 48 out of 50 trials being successful, if the probability to be successful is equal to 95%?".

In rbinom() you want to simulate random outcomes of a given binomial distribution. Here, the argument n is equal to the number of simulations (e.g. 5 means you want to repeat the random drawing 5 times), the argument size refers to the number of trials (as stated by the documentation) and prob is again the probability for success. So say you want to know how often there is heads, if 5 students each throw a coin 100 times. Then you can do this with rbinom() as follows:

rbinom(n = 5, size = 100, prob = 0.5)
#> [1] 44 52 50 48 55

^{Created on 2022-09-13 by the reprex package (v2.0.1)}

Here, student 1 had 44 times heads (or tails, if you treat success as tails, whatever pleases you) and so on. That's all about rbinom() basically.

Your example questions are ambigous in my opinion, since they are not fully clear. Take the first one

This would require 100 observations of the results, which we don't have. If you have those observations, you will have to use a statistical test to verify, that the coin doesn't have a 50:50 chance to throw heads or tails. But you won't be able to obtain the correct probability of the coin, since in hypothesis testing there is only significant rejection, not confirmation.

Maybe this helped you to understand the function arguments of dbinom() and rbinom() a bit better.

Kind regards

Andrzej · September 13, 2022, 12:06pm

Thank you very much indeed, this is very helpful.

If I reverse a bit your code:

rbinom(n =100, size = 5, prob = 0.5)

what is it that I get as a result of running this function ?

FactOREO · September 13, 2022, 12:53pm

Well, if you try this yourself you will find out, that you will get a vector with 100 simulation results, each by tossing 5 independent coins.

Andrzej · September 13, 2022, 1:05pm

Indeed, so is my interpretation correct:
I have got here 100 students throwing a coin 5 times (or throwing 5 independent coins once each ?) and as result I got:

that probability of getting heads(success) for first student is 2 out of 5 is 0.4 so 40%, for the second student that would be 3/5 = 0.6 meaning 60% and so on and on. Is it correct ?

nirgrahamuk · September 13, 2022, 1:13pm

You have 100 students, the first student is 2 out of 5 coins landing one way rather than the other, the proportion of that students flips one way rather than the other is 2/5 or 40%. the probability of tossing a coin and having it land a particular way was set by you when you did prob=0.5
To summarise, this analysis does not support an assertion such that there is a 40% probability of the first student getting 2 out of 5, it says that having done a simulation of the students activities, these are the results . if you did another simulation, you might get the same or different.

Andrzej · September 13, 2022, 1:32pm

Thank you,

Could you please show an example how to do it ?

nirgrahamuk · September 13, 2022, 1:38pm

like this for example

(sims_n <- rbinom(n =10000, size = 50, prob = 0.95))

#proportion of simulations that equal 48 success
length(which(sims_n == 48))/length(sims_n)

Andrzej · September 13, 2022, 4:41pm

Thank you very much again to both of you for your kind explanations and patience.
I found an interesting collection of probability puzzles with solutions, for example:

"Five foxes and seven hounds run into a foxhole. While they're inside they get all jumbled up, so that all orderings are equally likely.
The foxes and hounds run out of the hole in a neat line. On average, how many foxes are immediately followed by a hound?"

https://github.com/atorch/probability_puzzles_solutions

For anybody interested it will be good reading.
best regards,

Andrzej · September 14, 2022, 9:09am

Continuing learning of probability subject I stumbled upon that nice explanation:

https://stats.stackexchange.com/questions/336166/what-is-the-difference-between-dbinom-and-dnorm-in-r

where it is said: "So, dnorm(2) gives the height of this curve at x=2 ..."

but when I do this:

plot(dnorm(2))

it gives me that plot:

So my question is, why that dot is at 1.0 on x axis ? What does it mean index here ? Can't it be at 2.0 ?

nirgrahamuk · September 14, 2022, 10:13am

thats how r plot works, if you only give it one variable, it assumes its the y variable and it should use the indexes of that variable to form the x axis.
you can do

plot(x=2,y=dnorm(2))

or more elaborate

plot(x=seq(0,3,by=.1),y=dnorm(seq(0,3,by=.1)))
points(x=2,y=dnorm(2),col="red", pch =16)

mfeinleib · September 21, 2022, 12:02am

You know the binomial distribution. I would suggest that this is a question that you could answer yourself by experimenting with the dbinom function.

Right on. size represents the total number of trials.

Basically, my advice is to be brave and try to discover these smaller things on your own. Good luck!

system · September 28, 2022, 12:03am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.