Using some tidyverse tools,
library(dplyr)
library(tidyr)
library(lubridate)
df <- tibble(
demo = c("73C23", "62R31", "62M26", "58C44", "53R02", NA, "78R58", "76C63")
)
df <- df %>%
separate(
col = demo,
into = c("birth_year", "birth_month_index", "other"),
sep = c(2,3)
) %>%
mutate(
birth_year = as.numeric(birth_year),
birth_year = case_when(
birth_year > 18 ~ (1900 + birth_year),
birth_year <= 18 ~ (2000 + birth_year),
TRUE ~ birth_year
)
)
df
#> # A tibble: 8 x 3
#> birth_year birth_month_index other
#> <dbl> <chr> <chr>
#> 1 1973 C 23
#> 2 1962 R 31
#> 3 1962 M 26
#> 4 1958 C 44
#> 5 1953 R 02
#> 6 NA <NA> <NA>
#> 7 1978 R 58
#> 8 1976 C 63
- Note
separate
for splitting your single variable into it's parts -
case_when
and mutate for reworking the three new columns.
Here's a nice cheatsheet on these tasks:
With your birth_month, I personally like to merge over your index.
The month indices you're working with is not totally clear to me, but I hope you get the idea
Months <- tibble(
month_index = LETTERS[1:12],
month = lubridate::month(1:12, label = TRUE)
)
df <- df %>%
left_join(
Months,
by = c('birth_month_index' = 'month_index')
)
df
#> # A tibble: 8 x 4
#> birth_year birth_month_index other month
#> <dbl> <chr> <chr> <ord>
#> 1 1973 C 23 Mar
#> 2 1962 R 31 <NA>
#> 3 1962 M 26 <NA>
#> 4 1958 C 44 Mar
#> 5 1953 R 02 <NA>
#> 6 NA <NA> <NA> <NA>
#> 7 1978 R 58 <NA>
#> 8 1976 C 63 Mar
Created on 2018-10-31 by the reprex package (v0.2.1)
A good way to ask a quest like this is with a reproducible example, or what folks call a reprex for short. REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.