I'm trying to understand how geom_histogram() determines binning.
When I run the following code, ggplot(df, aes(body_mass_g)) + geom_histogram(bins = 40)
, how does ggplot decide the starting and ending points for each bin?
When there are only a few data points in df,
mydf <- data.frame(Sales = c(0, 5, 12, 19, 26, 29, 41, 82, 111, 400))
ggplot(mydf, aes(x=Sales)) + geom_histogram(bins=5)
it seems to apply binwidth = (max - min) / (bins - 1), and the first bin is from [min - binwidth/2, max + binwidth/2). In this case, the first bin's interval is [-50,50), so 0, 5, 12, 19, 26, 29, 41 (7 values) are included.
However the same process doesn't seem to apply when there are many data points.