Surprise behavior of cut_width

Documentation for cut_width is a bit deceptive.
closed: One of "right" or "left" indicating whether right or left edges of bins are included in the bin.

Which is not true for the final bin, which is closed on the right regardless.

> a=runif(10, 0, 5)
> cut_width(a,  width=0.5, center=0.25, closed="left")
 [1] [2,2.5) [0.5,1) [0.5,1) [3,3.5) [1.5,2) [3.5,4] [2,2.5) [0,0.5) [2,2.5) [1.5,2)
Levels: [0,0.5) [0.5,1) [1,1.5) [1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4]

This causes issues when combining binned datasets where the bin intervals are the same, but the last bins are different.

Is there a simple solution? I want the final bin to be [3.5,4). I can think of kludgy ways to brute force it, but surely there is a better way.

I think this is necessary to ensure you cover the entire space. Otherwise, the upper bound would be left out if a value is exactly the upper bound. Something similar happens if you use a closed="right" that the first bin is closed on both.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
x <- tibble(a=runif(10000, 0, 5)) |>
  mutate(
    a_right=ggplot2::cut_width(a,  width=0.5, center=0.25, closed="right"),
    a_left=ggplot2::cut_width(a,  width=0.5, center=0.25, closed="left"))

x |>
  summarize(
    N=n(),
    amin=min(a),
    amax=max(a),
    .by=c("a_right", "a_left")
  ) |>
  arrange(a_right)
#> # A tibble: 10 × 5
#>    a_right a_left      N     amin  amax
#>    <fct>   <fct>   <int>    <dbl> <dbl>
#>  1 [0,0.5] [0,0.5)  1002 0.000215 0.500
#>  2 (0.5,1] [0.5,1)  1015 0.500    1.00 
#>  3 (1,1.5] [1,1.5)  1021 1.00     1.50 
#>  4 (1.5,2] [1.5,2)   984 1.50     2.00 
#>  5 (2,2.5] [2,2.5)   973 2.00     2.50 
#>  6 (2.5,3] [2.5,3)   989 2.50     3.00 
#>  7 (3,3.5] [3,3.5)  1046 3.00     3.50 
#>  8 (3.5,4] [3.5,4)   999 3.50     4.00 
#>  9 (4,4.5] [4,4.5)   993 4.00     4.50 
#> 10 (4.5,5] [4.5,5]   978 4.50     5.00

Created on 2025-08-28 with reprex v2.1.1

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.