Create age groupings variable from single year ages

I would like to create age groupings variable from single year ages.

I would like

Under 16, 17-24, 25-34 etc

My dataset is called Pop_2023 and the variable is just called age.

Thanks,
John

Take care with age boundaries. For example, "Under 16" and "17-24" would leave 16 year olds without a category.

Here's a method:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
Pop_2023 <- data.frame(person = letters,
                       age = sample(10:40, 26, replace = TRUE))
Pop_2023 |>
  mutate(age_group = case_when(age < 16 ~ "Under 16",
                               age < 25 ~ "16-24",
                               age < 35 ~ "25-34",
                               TRUE ~ "35+"))
#>    person age age_group
#> 1       a  22     16-24
#> 2       b  16     16-24
#> 3       c  21     16-24
#> 4       d  30     25-34
#> 5       e  29     25-34
#> 6       f  40       35+
#> 7       g  13  Under 16
#> 8       h  22     16-24
#> 9       i  17     16-24
#> 10      j  34     25-34
#> 11      k  34     25-34
#> 12      l  13  Under 16
#> 13      m  35       35+
#> 14      n  14  Under 16
#> 15      o  34     25-34
#> 16      p  20     16-24
#> 17      q  27     25-34
#> 18      r  18     16-24
#> 19      s  20     16-24
#> 20      t  12  Under 16
#> 21      u  40       35+
#> 22      v  20     16-24
#> 23      w  34     25-34
#> 24      x  10  Under 16
#> 25      y  28     25-34
#> 26      z  31     25-34

Created on 2024-10-22 with reprex v2.1.1

1 Like

That's great thanks but how do I add the new age group variable to the original dataset?

As in keep the data that's already there and add a new age groupings variable at the end of this.

Many thanks again,

Just assign the result to the original object (you essentially overwrite the old value)

Pop_2023 <- Pop_2023 |>
  mutate(age_group = case_when(age < 16 ~ "Under 16",
                               age < 25 ~ "16-24",
                               age < 35 ~ "25-34",
                               TRUE ~ "35+"))
1 Like

brilliant many thanks :grinning: