how to create a continuos variable in R

How can i create a continuous variable in R? For instance imagine that f(x) = 3x^2 (for 0<x<1) is a probability density function (or it could literally be anything else) . How can i implement this variable in R? How can i do basic operations like finding the mean of f(x) or P(X=K)?

You can't really create a continuous variable in the sense that all you can create is a discrete grid. But you can certainly do something like

x <-seq(0,1,.001)
1 Like

Since you are in a programming language made for statistics, you can indeed handle continous probability distribution functions (the most common ones even with implemented functions) - but for custom functions only on a discrete grid, as @startz said.

Lets say you have a continous random variable defined as

\phi(x) = \begin{cases} \frac{1}{200} \cdot x\text{, if } 0 \le x \le 20 \\ 0\text{, otherwise} \end{cases}

You could implementent this (rather easy) function in R with the usual function approach for anything else:

my_pdf <- function(x){
 ifelse(x >= 0 & x <= 20, 1/200 * x, 0)

Using the curve() function you can plot the pdf pretty easily:

curve(expr = my_pdf, from = -5, to = 25, n = 1000)

It is however a bit more challenging to accomplish something like the cdf from a pdf without any "real" analysis. You can do so by exploiting the fact, that a cdf is nothing else then a (cumulative) sum of prior values from the pdf, multiplied by the difference between consecutive x values, because

\Phi(x) = \int_{-\inf}^{\inf} \phi(x) dx = \frac{x^2}{400} (|^{20}_{0} = 1)

holds true for a continous random variable - we basically have to do the approximation of an integral with a rather large sum. Applying this to our little toy example:

cumsum(my_pdf(seq.default(-5,25,0.1)) * 0.1) |> plot(x = seq.default(-5,25,0.1), type = 'l')

will yield a rather nice plot of our CDF and the calculated values can be ordered with corresponding x values to obtain a set of pairwise numbers to approximate our CDF:

my_cdf <- function(lower, upper, step){
  x <- seq.default(from = lower, to = upper, by = step)
  pdf <- my_pdf(x)
  cdf <- cumsum(pdf) * step
  data.frame(x = x, pdf = pdf, cdf = cdf)

This can also be used to calculate probabilities - in case of P(X = x) this would be 0 in the continous case - but for others you can use the following:

P_of_X <- function(x, greater = FALSE){
  ### We need our CDF first
  DF <- my_cdf(0,20,0.01)
  ### P(X = x) = 0
  ### P(X <= x) = F(x)
  ### P(X >= x) = 1 - F(x)
  # Since we cannot cover all values, we need to find the nearest value
  if(!(x %in% DF$x)) x <- DF[which.min(abs(x - DF$x)),]$x
  P <- DF[which(DF$x == x),]$cdf
  } else {
    1 - P

So now we are left with the expectation. Here we need random numbers from our distribution and will not come around a statistical method like the inversion method. So take the inverse of our CDF from above (which we calculated analitically) and we get \sqrt{400y} = 20 \cdot \sqrt{y} = F^{(-1)}_X(y) (ignore the negative solution).

Now we can generate random numbers from our given distribution with uniform distributed random draws between 0 and 1 and hence calculate values like the (empirical) mean:

inv_cdf <- function(x) 20 * sqrt(x)

which will be close to 13.333 as the theoretical mean tells us.

Kind regards


In the last part of @FactOREO's very nice post, he shows how to get the expected value using Monte Carlo integration through a "probability integral transformation (PIT)." An alternative is to due numerical integration. First, make a discrete version of a probability density, being sure it adds up to one. Then just multiply X by the density to get the expected value.

# suppose x is distributed uniform between a and b
pdf <- function(x){1/(b-a)}
x <- seq(a,b,length.out = 1000)
f <- pdf(x)/sum(pdf(x)) # so it integrates to one

expectedValue <- sum(x*f)

This a HUGE contribution to my programming skills in R.
Thanks a lot.
@startz thank you too.

Happy I could help.

As another side note: Using Monte Carlo Simulations will often times be useful, since a) R is pretty good at efficiently creating large numbers of draws and b) there are statistical laws (like the central limit theorem) which will support the estimation of such paramaters from a simulated distribution of values.

@startz This is actually not a completely valid numeric integration to calculate the expected value, is it? In your pdf, you do not use the x actively, so you end up with a horizontal line for all x \in \mathbb{R} instead of for all x \in [a,b]. I added the missing part for the sake of completeness:

a <- 5
b <- 10
pdf <- function(x,a,b){ifelse(a <= x & x <= b, 1/(b-a), 0)}
x <- seq(a,b,length.out = 50000)
f <- pdf(x,a,b)/sum(pdf(x,a,b)) # so it integrates to one

sum(x * f)

This will yield in the correct result of \frac{1}{2} \cdot (a + b) .

1 Like

Well that's embarrassing. Thanks @FactOREO for the correction.

1 Like

Alternate solution:

integrand <- function(x) {3*x^2}
integrate(integrand, lower = 0, upper = 1)

integrand <- function(x) {x*(3*x^2)}
integrate(integrand, lower = 0, upper = 1)

1 with absolute error < 1.1e-14
0.75 with absolute error < 8.3e-15

1 Like