The {santoku}
package provides chop()
, an alternative for base::cut()
. Unlike cut()
, it automatically extends its breaks to cover all the data by default:
chop(1:5, c(2, 4))
[1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
Levels: [1, 2) [2, 4) [4, 5]
cut(1:5, c(2, 4))
[1] <NA> <NA> (2,4] (2,4] <NA>
Levels: (2,4]
Another parameter is close_end
, which closes the rightmost interval. At present, this applies to the rightmost explicitly specified interval:
chop(1:5, c(2, 4), close_end = TRUE)
[1] [1, 2) [2, 4] [2, 4] [2, 4] (4, 5]
Levels: [1, 2) [2, 4] (4, 5] # <--- [2, 4] is now closed on the right
The advantage of this approach is that you always know what your explicitly specified intervals will be like, irrespective of whether the intervals are extended to cover extra data.
An alternative would be that close_end
applies to the last interval, whether that is extended or not:
chop(1:5, c(2, 4), close_end = TRUE)
[1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
Levels: [1, 2) [2, 4) [4, 5] # <--- now [4, 5] is closed
The advantage of this approach is that it may be more intuitive. The disadvantage is that it doesn't do anything if intervals are extended. When intervals are extended, they're always closed, so as to cover max(x)
and min(x)
:
chop(rnorm(5, sd = 2), -1:1)
[1] [-1, 0) [-4.073, -1) [-4.073, -1) [-1, 0) [1, 2.393]
Levels: [-4.073, -1) [-1, 0) [1, 2.393]
What do forum users think would be the best approach?