mydata <- cbind(
ID = c(1, 2, 3, 4, 5 ,6, 7, 8, 9, 10), # ID of patient
Age = c(22, 55, 90, 7, 14, 100, 72, 85, 91, 43),# Age of patient
Gender = c(1, 2, 2, 2, 1, 1, 2, 1, 1, 2)# Gender, 1 = Male, 2 = Female
)
# I am trying to create a new column which would have age breaks.
# I've provided the matrix above as an example of what I am working on.
# I'd like to create a column which can take my age data and put it in categories of
# 0 - 29, 30 - 34, 35-39 up to 90+.
# Can anyone help me with the code please?
# I've tried using 'cut' function but it will only take a single number
# rather than what I've put.
First, here's a nice way to setup your data for the reprex:
df <- data.frame(
ID = c(1, 2, 3, 4, 5 ,6, 7, 8, 9, 10), # ID of patient
Age = c(22, 55, 90, 7, 14, 100, 72, 85, 91, 43),# Age of patient
Gender = c(1, 2, 2, 2, 1, 1, 2, 1, 1, 2)# Gender, 1 = Male, 2 = Female
)
I think cut
could still work for you (I think you want slightly different breaks...).
library(dplyr)
df %>%
mutate(
age_cut = cut(Age, breaks = c(0,30,35,40,Inf))
)
#> ID Age Gender age_cut
#> 1 1 22 1 (0,30]
#> 2 2 55 2 (40,Inf]
#> 3 3 90 2 (40,Inf]
#> 4 4 7 2 (0,30]
#> 5 5 14 1 (0,30]
#> 6 6 100 1 (40,Inf]
#> 7 7 72 2 (40,Inf]
#> 8 8 85 1 (40,Inf]
#> 9 9 91 1 (40,Inf]
#> 10 10 43 2 (40,Inf]
Created on 2018-05-21 by the reprex package (v0.2.0).
Hi Curtis, thanks for this. I've tried the code - but I am getting breaks of 0 - 30, 30 - 35, 35 - 40, 45 - 50 etc. What I need to achieve is 0 - 29, 30 - 34, 35 - 39, 40 - 44, 45 - 49, 50 - 54, 55 - 59, 60 - 64, 65 - 69, 70 - 74, 75 - 79, 80 - 84, 85 - 89, 90+
This is to match my patient data to some published data, which has the above breaks. Not my choice of banding
Note the breaks argument in cut. You can adjust these to whatever bins you’d like.
Note the bins intervals in cut use the standard notation, with brackets ], for inclusive and parenthesis ) for exclusive.
I have tried all morning to create the breaks I listed with the 'cut' function - which was why I posted. Not to worry - I have gone with the old fashioned method below - a bit long winded but it's done the trick!
df1$AgeCat[df1$Age >= 0 & df1$Age <= 29] <- "0 - 29"
df1$AgeCat[df1$Age >= 30 & df1$Age <= 34] <- "30 - 34"
df1$AgeCat[df1$Age >= 35 & df1$Age <= 39 ] <- "35 - 39"
df1$AgeCat[df1$Age >= 40 & df1$Age <= 44 ] <- "40 - 44"
df1$AgeCat[df1$Age >= 45 & df1$Age <= 49 ] <- "45 - 49"
df1$AgeCat[df1$Age >= 50 & df1$Age <= 54 ] <- "50 - 54"
df1$AgeCat[df1$Age >= 55 & df1$Age <= 59 ] <- "55 - 59"
df1$AgeCat[df1$Age >= 60 & df1$Age <= 64 ] <- "60 - 64"
df1$AgeCat[df1$Age >= 65 & df1$Age <= 69 ] <- "65 - 69"
df1$AgeCat[df1$Age >= 70 & df1$Age <= 74 ] <- "70 - 74"
df1$AgeCat[df1$Age >= 75 & df1$Age <= 79 ] <- "75 - 79"
df1$AgeCat[df1$Age >= 80 & df1$Age <= 84 ] <- "80 - 84"
df1$AgeCat[df1$Age >= 85 & df1$Age <= 89 ] <- "85 - 89"
df1$AgeCat[df1$Age >= 90] <- "90+"
# Create Age bands table ------------------------------------------------------
df2 <- as.data.frame.matrix(table(df1$AgeCat, df1$Gender))
# Converts exactly as laid out in table
It's useful to keep your code concise and avoid replication.
Unless I am misunderstanding something (wouldn't be the first time) cut
can give you the same result, though with slightly different category text.
for example for the first few categories;
df <- data.frame(
ID = c(1, 2, 3, 4, 5 ,6, 7, 8, 9, 10), # ID of patient
Age = c(22, 55, 90, 7, 14, 100, 33, 85, 91, 43),# Age of patient
Gender = c(1, 2, 2, 2, 1, 1, 2, 1, 1, 2)# Gender, 1 = Male, 2 = Female
)
library(dplyr)
df %>%
mutate(
age_cut = cut(Age, breaks = c(0,29,34,39,Inf))
)
#> ID Age Gender age_cut
#> 1 1 22 1 (0,29]
#> 2 2 55 2 (39,Inf]
#> 3 3 90 2 (39,Inf]
#> 4 4 7 2 (0,29]
#> 5 5 14 1 (0,29]
#> 6 6 100 1 (39,Inf]
#> 7 7 33 2 (29,34]
#> 8 8 85 1 (39,Inf]
#> 9 9 91 1 (39,Inf]
#> 10 10 43 2 (39,Inf]
Created on 2018-05-21 by the reprex package (v0.2.0).
The category (29,34]
is equivalent to your 30 - 34
range.
The breaks arguments is where you can set these ranges. For example with breaks = c(0,29,34,39,Inf)
, the bins will be set between 0, 29, 34, 39 and infinity.
No your not misunderstanding anything, it's me being thick! I kept putting 0 - 34, 35 - 39 etc. in the 'cut' function (I know, silly move). Curtis this is fab, thanks very much :)