Hi, I want to create following bins for my variable:
0-10, 11-25, 26-40, 41-55.
How do I do it, please ?
Hi, I want to create following bins for my variable:
0-10, 11-25, 26-40, 41-55.
How do I do it, please ?
I think cut() will do what you want.
DF <- data.frame(Vairable=sample.int(55,size = 25))
DF
#> Vairable
#> 1 35
#> 2 47
#> 3 1
#> 4 44
#> 5 55
#> 6 4
#> 7 48
#> 8 7
#> 9 53
#> 10 54
#> 11 23
#> 12 28
#> 13 10
#> 14 33
#> 15 34
#> 16 46
#> 17 25
#> 18 3
#> 19 5
#> 20 45
#> 21 15
#> 22 19
#> 23 30
#> 24 42
#> 25 41
DF$bins <- cut(DF$Vairable,breaks = c(0,10,25,40,55),include.lowest = TRUE)
DF
#> Vairable bins
#> 1 35 (25,40]
#> 2 47 (40,55]
#> 3 1 [0,10]
#> 4 44 (40,55]
#> 5 55 (40,55]
#> 6 4 [0,10]
#> 7 48 (40,55]
#> 8 7 [0,10]
#> 9 53 (40,55]
#> 10 54 (40,55]
#> 11 23 (10,25]
#> 12 28 (25,40]
#> 13 10 [0,10]
#> 14 33 (25,40]
#> 15 34 (25,40]
#> 16 46 (40,55]
#> 17 25 (10,25]
#> 18 3 [0,10]
#> 19 5 [0,10]
#> 20 45 (40,55]
#> 21 15 (10,25]
#> 22 19 (10,25]
#> 23 30 (25,40]
#> 24 42 (40,55]
#> 25 41 (40,55]
Created on 2022-06-04 by the reprex package (v2.0.1)
You can also assign labels to the bins. Because your data is integers 11-25 might be clearer than (10, 25].
DF$bins <- cut(DF$Vairable,
breaks = c(0,10,25,40,55),
labels = c("0-10", "11-25", "26-40", "41-55"),
include.lowest = TRUE)
Hi and thank you to both of you,
What does it mean here: ( and ] ?
Do I need to specify it or cut() function does it by itself ?
Some of the numbers are place between [ ] and some between ( ].
Much appreciated for explanation, thank you.
If I need a value to be between 11 and 25 what is best to do ? I mean 11 is inluded and 25 is included in that particular bin.
(10, 25] means a bin defined by 10 < x <= 25. The ( means < and the ] means <=. Similarly, [0,10] means 0 <= x <= 10. Since you have integers, (10, 25] acts as the bin 11 - 25; it excludes 10 but accepts 11.
Is it possible to do like:
[10, 25] so 10 >= x <= 25 ?
You cannot define [10,25] in the middle of your range using the cut() function. The problem is that the neighboring ranges would have to be [0, 10) and (25, 40] to avoid matching two ranges at the boundary values. The cut() function will not do that. You can manually define whatever bins you want with the case_when() function.
DF <- data.frame(Variable=sample.int(55,size = 25))
DF <- DF |> mutate(bins = case_when(
Variable >= 0 & Variable <= 10 ~ "0-10",
Variable >= 11 & Variable <= 25 ~ "11-25",
Variable >= 26 & Variable <= 40 ~ "26-40",
Variable >= 41 & Variable <= 55 ~ "41-55",
TRUE ~ "Out of range"
))
That does the same thing as in the previous answers but you can tune it however you want.
Thank you, much appreciated.
I just have found a good read:
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.