I have a data frame of age and height column. I want to grouping height by the age (i.e. 21-25, 26-30, etc.). So the output is: for the age group 21-25, heights are 166, 175, 172, etc.
How to do that? If possible, using the function split()
Why do you want to use the functionsplit
?
Homework?
A suitable package for data.frames
would be dplyr
or are you not allowed to use that?
For an exercise. Otherwise is okay.
The main purpose is that i wanna make boxplot for heights stratified by age. But rather than using every single age data, i want to grouping the age instead.
Can you give an example?
I've never used split before. I would use case_when()
to group:
library(tidyverse)
df <- tribble(
~age, ~ht,
25, 165,
30, 170,
35, 175,
40, 170,
45, 165
)
df <- df %>%
mutate(
age_g = case_when(
age > 35 ~ ">35",
age <= 35 ~ "<=35"
))
df
Yields:
# A tibble: 5 x 3
age ht age_g
<dbl> <dbl> <chr>
1 25 165 <=35
2 30 170 <=35
3 35 175 <=35
4 40 170 >35
5 45 165 >35
and the box plot:
df %>%
ggplot(aes(x = age_g, y = ht)) +
geom_boxplot()
It's give me a clue.
Yet, suppose that the age data more varied
Example:
age ht
23 173
20 168
28 170
22 166
27 175
26 175
i want to stratify so the outcome are more or less like this
$`20-24`
173 168 166
$`25-29`
170 175 175
etc.
This is considered 'binning', of course what you bin by you can later group by but binning comes first.
You haven't said if the binning would be conducted manually by your choice of break points or whether a principled way should be found to break by. The cut() function is good built in way to get linear breaks on a variable. I believe the binr library has other perhaps more sophisticated binning functions
Yes, i solved with cut()
function. Thank you.
And thanks again for introduce the term "binning"
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.