I was doing my assignment, and I found something strange.
I did this code for question #1.
x <- heights$height[heights$sex=="Male"]
and the next question is like this:
"We will define a function "CDF" like following:
CDF <- function(a) {mean(x<=a)}
Explain why the CDF function is Cumulative Distribution Function."
I get the idea of the cumulative distribution function, but I don't get why function mean() is used there.
For example, CDF(70) equals 0.623..., which is the probability of cumulative distribution at 70. How does mean() function show probability in this function?
Hi, I think the key here is to think a bit step-by-step.
What will happen when you enter x<=a? You will get a logical vector of TRUE/FALSE values.
Then you take the mean of it. This gives you a proportion since TRUE is treated as 1 and FALSE is treated as 0.
set.seed(12345)
x <- rnorm(20)
# Let's say we want the probablity x <= a where a=0
x
#> [1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875 -1.8179560
#> [7] 0.6300986 -0.2761841 -0.2841597 -0.9193220 -0.1162478 1.8173120
#> [13] 0.3706279 0.5202165 -0.7505320 0.8168998 -0.8863575 -0.3315776
#> [19] 1.1207127 0.2987237
x<=0
#> [1] FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
#> [13] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE
mean(x<=0) # this really gives us a proportion because it is a mean of 0 and 1
#> [1] 0.5