Automated factor leveling using a data dictionary/codebook

Hi,

So I have a dataset which has ~250 variables and ~500K observations. I also have an accompanying data dictionary which specifies the value-label definitions for each nominal/ordinal variable in the data set. The dictionary is organized in three columns which specify the variable name, the possible values, and the labels for the values. The data set only contains the number values, without labels.

I have generated some fake data which captures the problem. I am drawing a blank for a good way to automatically make use of the codebook/dictionary to automate factoring these variables and using the labels. Any ideas?

library(dplyr)

data <- tribble(
  ~var,
  1,
  2,
  3,
  4
)

dict <- tribble(
  ~var_name, ~value, ~label,
  'var', 1, 'A',
  'var', 2, 'B',
  'var', 3, 'C',
  'var', 4, 'D'
)

Is this right?

for (n in names(data)){
	temp <- filter(dict, var_name == n) 
	data[[n]] <- factor(data[[n]], levels = temp$value, labels = temp$label)
}
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.