Statistics cheat sheet for computing discreate and continuous random variables

Is there a cheat-sheet that addresses probabilities, for example how to compute discrete and continuous random variables and a guide to the units you would need to do so?

A stats R beginner,

Thanks

Welcome to the quest.

Starting from ground zero in stats and R at the same time is a challenge. My stats were self taught and three decades unused when I took up R, so I understand.

Start by thinking about the difference between discrete and continuous variables. A discrete variable can take on only one of two values, usually {1,0}. A continuous varia}ble can take on any real, \mathbb{R} number, but in practice we sample them to an interval to some number of decimal places, so that we may see 3.14, 3.141, 3.1415 etc. but never "all the way out". Then there are categorical variables, such as {1,2,3,4}

So, those sets are the "objects" on which R is called to work. (In R, everything, including functions, is an object). There is, strictly speaking, no "probability" of those objects, as such, except in some relationship to another object.

Formally, this is expressed E[y|x]. What is the probability of y given x?

That is, if y is a discrete or continuous variable and x is one or more other discrete, continuous or categorical variables, what is the probability of y in the presence of x?

All of which is to say that the question calls for a roadmap to cheatsheets, depending on x, y and their interrelationship. For example, they may be linearly related, in which as there is ordinary least squares regression and logistic regression, for example.

Questions are harder than answers. Rethink the question and answers will come.

If a bit of nit-picking may be forgiven...

A discrete variable can have any countable number of values, not just two.

E[y|x] is not the probability of y given x, it's the expected value--the conditional mean if you like.

1 Like

The common probability distributions have functions in the R language beginning with p or q.

pnorm(q, mean, sd) returns the area to the left of a given value q in the normal distribution.

qnorm(p, mean, sd) returns the Z-score is of the pth quantile of the normal distribution.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.