Histograms in Python versus R

Hi, I am plotting a histogram in Python with matplotlib and also in base R. I am using the same data and bins = 20, but the plots are not identical. I believe matplotlib uses a bin of [x,y). To get the same binsin R I used right=FALSE.

plt.hist(df['pts'], bins=20)

hist(df$pts, breaks=20, right=FALSE)

How can I make the R histogram identical to the Python? Thank you.

I believe that breaks = 20 corresponds to 19 bins. Every bin has a break on its right and the lowest bin also has a break on its left.
If you store the value returned by hist(), it will be a list with a breakselement that contains the actual breaks used.

OUT <- hist(df$pts, breaks=20, right=FALSE)
OUT$breaks
1 Like

Thank you. What is the equivalent statement in Python?

Thank you FJCC. I think you pointed me toward my issue. It looks like R default binning is to create round number bin edges, but Python does not. I would have to force equal bins to get identical graphs.

100 random gamma distributed nos. rgamma(100, shape=3, scale=1.5). mean 3*1.5 = 4.5, sd √3 * 1.5

g <- c(1.0512512, 7.8399122, 2.6936226, 4.2939500, 2.7098544, 5.4573509,
3.2832411, 2.8593164, 4.9272202, 6.3668698, 0.9626574, 4.7971198,
0.9187876, 4.2365611, 10.6146960, 2.4271067, 6.3803476, 8.6018513,
3.0456869, 2.6207893, 5.1913523, 4.9758871, 5.1040121, 4.7248448,
5.9659899, 2.0971541, 5.3016751, 8.9969804, 3.4539753, 2.1611429,
1.8927263, 0.6807235, 1.8750507, 2.0106213, 4.9305607, 3.5458282,
7.9027088, 2.3239765, 5.7019243, 3.7696105, 8.0961550, 1.7250875,
2.2981261, 5.7404523, 8.9463263, 4.5670453, 7.2035282, 7.5141303,
4.3797226, 4.3210831, 2.3278066, 4.8762280, 4.1790747, 2.8071184,
4.1133451, 3.8563587, 2.9650323, 9.3623267, 3.9114523, 5.8011861,
2.0859649, 9.7414199, 6.2468257, 4.9252974, 4.5765937, 3.2322433,
5.8711536, 1.3186444, 5.5054662, 3.8736942, 4.8935354, 4.4345151,
3.8520213, 2.5700794, 3.7270397, 2.2596008, 2.2533915, 3.9576529,
4.8503113, 3.7897496, 2.7950351, 5.7089923, 2.7820445, 2.2366437,
7.2482578, 2.3815563, 1.9668393, 3.0388825, 2.4943421 , 1.4674247,
4.4524486, 2.5357830, 8.4381612, 4.0802902, 4.9129094, 7.5555115,
5.2959941, 3.5959140, 7.7978177, 3.1431833)

hist(g, breaks=20, right=FALSE, main = "Default breaks in R")
OUT <- hist(df$pts, breaks=20, right=FALSE)
OUT$breaks # 0.5 1.0 1.5 2.0 2.5 3.0 , etc.

mn <- min(g)
mx <- max(g)
w <- round((mx - mn)/20,2)
bin <- vector()
for (i in 1:21){
bin[i] <- mn + (i-1)*w
}
bin # 0.6807235 1.1807235 1.6807235 2.1807235, etc.
hist(g, breaks=bin, right=FALSE, main = "Default breaks in Python")

Python code to print bin edges

bins = np.histogram_bin_edges(g, bins=20)
print("Bin edges:", bins)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.