Hi!
I'm new to R. I have experience in other languages or statistical apps.
I'm doing a process analysis and finally I learn to do histograms, but when I selected the option for plotting densities instead of frequencies I noticed that individual values of the distribution sum up 2 instead of 1. I verified and the height of each bar is ok. Can anybody explain that behavior ...
Please provide the code you ran so that we can more easily help you.
For now, maybe the code below will help clarify what's happening.
# Generate two density distributions with densities on the same x-scale
set.seed(5)
x1 = rnorm(1000, 0, 1)
x2 = rnorm(1000, 3, 1)
x1d = density(x1, from=-5, to=8)
x2d = density(x2, from=-5, to=8)
# Area under each curve is 1
sum(x1d$y * median(diff(x1d$x)))
#> [1] 1.000979
sum(x2d$y * median(diff(x2d$x)))
#> [1] 1.000978
plot(x1d, ylim=c(0, 0.45))
lines(x2d)
# Area under curve has doubled to 2 when we add the densities
lines(x1d$x, x1d$y + x2d$y, col="red")
# Renormalize so area under curve is 1
lines(x1d$x, (x1d$y + x2d$y)/2, col="blue", lwd=3)
This is no my real code. I'm using your sample data and the behavior is the same.
The chole example data was taken from another forum. It got worst. The sum of densities totals 0.02.
Sorry for not understanding the format for posting code
The density values will not sum to one. The density is represented with point values, usually evenly spaced. If you increase the number of points and reduce their spacing, the sum of the density points will increase. What may sum to one, for the right kind of function, is the area under the curve. That is what @joels calculated: