The background of this problem is that I have a large database of values and
a database of metadata (i.e. labels for the factor variables).
In order to add factor labels from the metadata to the data I am trying to use the labelled
function from haven
.
This will give the working data with the nice property that I can choose if I want to work with the data as numbers (as stored in the database) or using factors with labels (nice for plotting, readability etc.).
I have thousands of variables so I can not code this per variable. I also want to use this on the server backend in order to create datasets for visualisations, exports to R, SPSS, SAS etc.
A small example shows that I have no problem to solve my problem for one variable:
suppressWarnings(library(tidyverse))
d <- dplyr::tribble(~V1, ~V2, ~V3,
1, 2, NA,
2, 99, 'missing',
1, 2, NA,
1, 1, NA)
f <- dplyr::tribble(~Var, ~value, ~label,
'V1', 1, 'a',
'V1', 2, 'b',
'V1', 3, 'c',
'V2', 1, 'yes',
'V2', 2, 'no',
'V2', 99, 'missing')
# create a named list for labels
f_list <- f %>% split(.$Var) %>% map(~purrr::set_names(.$value, .$label))
# View f_list
f_list
#> $V1
#> a b c
#> 1 2 3
#>
#> $V2
#> yes no missing
#> 1 2 99
# Set labels on d$V1 and V2
d_new <- d %>% mutate(V1 = haven::labelled(V1, labels = f_list$V1),
V2 = haven::labelled(V2, labels = f_list$V2))
# Check result
d_new %>% haven::as_factor()
#> # A tibble: 4 x 3
#> V1 V2 V3
#> <fct> <fct> <chr>
#> 1 a no <NA>
#> 2 b missing missing
#> 3 a no <NA>
#> 4 a yes <NA>
# It Works!
Now. I can not figure out how to map over all columns in d without calling them directly in d
and f_list
That is to apply the labels in the named list f_list
to the correct column in d
.
The following will obviusly not work:
lapply(d, haven::labelled, labels = f_list)
#> Error: `x` and `labels` must be same type
Created on 2018-12-02 by the reprex package (v0.2.1)