How can i create a continuous variable in R? For instance imagine that f(x) = 3x^2 (for 0<x<1) is a probability density function (or it could literally be anything else) . How can i implement this variable in R? How can i do basic operations like finding the mean of f(x) or P(X=K)?
You can't really create a continuous variable in the sense that all you can create is a discrete grid. But you can certainly do something like
x <-seq(0,1,.001)
y<-3*x^2
plot(x,y)
Since you are in a programming language made for statistics, you can indeed handle continous probability distribution functions (the most common ones even with implemented functions) - but for custom functions only on a discrete grid, as @startz said.
Lets say you have a continous random variable defined as
You could implementent this (rather easy) function in R
with the usual function approach for anything else:
my_pdf <- function(x){
ifelse(x >= 0 & x <= 20, 1/200 * x, 0)
}
Using the curve()
function you can plot the pdf pretty easily:
curve(expr = my_pdf, from = -5, to = 25, n = 1000)
It is however a bit more challenging to accomplish something like the cdf from a pdf without any "real" analysis. You can do so by exploiting the fact, that a cdf is nothing else then a (cumulative) sum of prior values from the pdf, multiplied by the difference between consecutive x values, because
holds true for a continous random variable - we basically have to do the approximation of an integral with a rather large sum. Applying this to our little toy example:
cumsum(my_pdf(seq.default(-5,25,0.1)) * 0.1) |> plot(x = seq.default(-5,25,0.1), type = 'l')
will yield a rather nice plot of our CDF and the calculated values can be ordered with corresponding x values to obtain a set of pairwise numbers to approximate our CDF:
my_cdf <- function(lower, upper, step){
x <- seq.default(from = lower, to = upper, by = step)
pdf <- my_pdf(x)
cdf <- cumsum(pdf) * step
data.frame(x = x, pdf = pdf, cdf = cdf)
}
This can also be used to calculate probabilities - in case of P(X = x) this would be 0 in the continous case - but for others you can use the following:
P_of_X <- function(x, greater = FALSE){
### We need our CDF first
DF <- my_cdf(0,20,0.01)
### P(X = x) = 0
### P(X <= x) = F(x)
### P(X >= x) = 1 - F(x)
# Since we cannot cover all values, we need to find the nearest value
if(!(x %in% DF$x)) x <- DF[which.min(abs(x - DF$x)),]$x
P <- DF[which(DF$x == x),]$cdf
if(!greater){
P
} else {
1 - P
}
}
So now we are left with the expectation. Here we need random numbers from our distribution and will not come around a statistical method like the inversion method. So take the inverse of our CDF from above (which we calculated analitically) and we get \sqrt{400y} = 20 \cdot \sqrt{y} = F^{(-1)}_X(y) (ignore the negative solution).
Now we can generate random numbers from our given distribution with uniform distributed random draws between 0 and 1 and hence calculate values like the (empirical) mean:
inv_cdf <- function(x) 20 * sqrt(x)
mean(inv_cdf(runif(10000)))
which will be close to 13.333 as the theoretical mean tells us.
Kind regards
In the last part of @FactOREO's very nice post, he shows how to get the expected value using Monte Carlo integration through a "probability integral transformation (PIT)." An alternative is to due numerical integration. First, make a discrete version of a probability density, being sure it adds up to one. Then just multiply X by the density to get the expected value.
# suppose x is distributed uniform between a and b
pdf <- function(x){1/(b-a)}
x <- seq(a,b,length.out = 1000)
f <- pdf(x)/sum(pdf(x)) # so it integrates to one
expectedValue <- sum(x*f)
Happy I could help.
As another side note: Using Monte Carlo Simulations will often times be useful, since a) R is pretty good at efficiently creating large numbers of draws and b) there are statistical laws (like the central limit theorem) which will support the estimation of such paramaters from a simulated distribution of values.
@startz This is actually not a completely valid numeric integration to calculate the expected value, is it? In your pdf, you do not use the x actively, so you end up with a horizontal line for all x \in \mathbb{R} instead of for all x \in [a,b]. I added the missing part for the sake of completeness:
a <- 5
b <- 10
pdf <- function(x,a,b){ifelse(a <= x & x <= b, 1/(b-a), 0)}
x <- seq(a,b,length.out = 50000)
f <- pdf(x,a,b)/sum(pdf(x,a,b)) # so it integrates to one
sum(x * f)
This will yield in the correct result of \frac{1}{2} \cdot (a + b) .
Alternate solution:
integrand <- function(x) {3*x^2}
integrate(integrand, lower = 0, upper = 1)
integrand <- function(x) {x*(3*x^2)}
integrate(integrand, lower = 0, upper = 1)
1 with absolute error < 1.1e-14
0.75 with absolute error < 8.3e-15
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.