Seeking feedback on API design for a `cut()` alternative

hughjonesd · June 4, 2022, 10:20am

The {santoku} package provides chop(), an alternative for base::cut(). Unlike cut(), it automatically extends its breaks to cover all the data by default:

chop(1:5, c(2, 4))
[1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
Levels: [1, 2) [2, 4) [4, 5]

cut(1:5, c(2, 4))
[1] <NA>  <NA>  (2,4] (2,4] <NA> 
Levels: (2,4]

Another parameter is close_end, which closes the rightmost interval. At present, this applies to the rightmost explicitly specified interval:

chop(1:5, c(2, 4), close_end = TRUE)
[1] [1, 2) [2, 4] [2, 4] [2, 4] (4, 5]
Levels: [1, 2) [2, 4] (4, 5] # <--- [2, 4] is now closed on the right

The advantage of this approach is that you always know what your explicitly specified intervals will be like, irrespective of whether the intervals are extended to cover extra data.

An alternative would be that close_end applies to the last interval, whether that is extended or not:

chop(1:5, c(2, 4), close_end = TRUE)
[1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
Levels: [1, 2) [2, 4) [4, 5] # <--- now  [4, 5] is  closed

The advantage of this approach is that it may be more intuitive. The disadvantage is that it doesn't do anything if intervals are extended. When intervals are extended, they're always closed, so as to cover max(x) and min(x):

 chop(rnorm(5, sd = 2), -1:1)
[1] [-1, 0)      [-4.073, -1) [-4.073, -1) [-1, 0)      [1, 2.393]  
Levels: [-4.073, -1) [-1, 0) [1, 2.393]

What do forum users think would be the best approach?

system · June 25, 2022, 10:20am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.