There’s probably a much easier way to do this (and not sure how widely useful it is!), but I was working with data which had missing codes (e.g. 777,888,999) mixed in with actual measured values, so I wanted to split them out so I could work with the actual data without having to filter out the missing codes all the time, and analyze the missing codes separately if needed.
The short version of the function just duplicates the columns (assumes that the first column is a list of ID numbers) and names each new column ‘originalname_NA’
(The longer version then replaces all the missing codes with NA in the original columns, and the original data with NA in the new columns - here:
na_columns <- function(data){
for (indexCol in 2:ncol(data)){
columnName <- colnames(data)[indexCol]
varname <- paste0(columnName, "_NA")
data <- mutate(data, !!varname := data[[indexCol]])
}
data
}
df <- data.frame(id_no = c(1,2,3,4,5,6,7,8,9,10),
age = c(45,32,44,23,47,999,45,999,50,38),
score = c(23,24,26,30,888,999,999,999,30,28),
score_2 = c(24,24,888,30,22,20,999,999,30,999))
na_columns(df)
# id_no age score score_2 age_NA score_NA score_2_NA
# 1 45 23 24 45 23 24
# 2 32 24 24 32 24 24
# 3 44 26 888 44 26 888
# 4 23 30 30 23 30 30
# 5 47 888 22 47 888 22
# 6 999 999 20 999 999 20
# 7 45 999 999 45 999 999
# 8 999 999 999 999 999 999
# 9 50 30 30 50 30 30
# 10 38 28 999 38 28 999