I am not sure I understand the issue with the data loading step,
but the import data interface step of "Please insert a comma-separated list of factors" sets what you give as factor levels for your data. If the levels supplied to match exactly a value, it's converted into an NA
.
For example, note the case-sensitive error below.
library(readr)
eg1 <- read_csv(
"C1, C2, C3, C4
100, a1, b1, woman
200, a2, b2, man
300, a3, b3, man
400, a4, b4, woman",
col_types = cols(
C4 = col_factor(levels = c('woman',"Man", "other"))))
#> Warning: 2 parsing failures.
#> row # A tibble: 2 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 2 C4 value in level set man literal data file 2 3 C4 value in level set man literal data
eg1
#> # A tibble: 4 x 4
#> C1 C2 C3 C4
#> <int> <chr> <chr> <fct>
#> 1 100 a1 b1 woman
#> 2 200 a2 b2 <NA>
#> 3 300 a3 b3 <NA>
#> 4 400 a4 b4 woman
Created on 2018-08-31 by the reprex package (v0.2.0.9000).
One option is to load the file without additional load as factor steps, instead loading that variable as a character. Then in a later step converting the variable into a column with as.factor
.
Though factors are ordinal and the order of factors is often important.
So if the order of your factors are important, be sure to relevel them. (Here's a handy tidyverse function that can help with that https://forcats.tidyverse.org/reference/fct_relevel.html)
library(readr)
library(dplyr)
df <- read_csv(
"C1, C2, C3, C4
100, a1, b1, 'woman'
200, a2, b2, 'man'
300, a3, b3, 'man'
400, a4, b4, 'woman'")
df <- df %>%
mutate(
C4 = as.factor(C4)
)
df
#> # A tibble: 4 x 4
#> C1 C2 C3 C4
#> <int> <chr> <chr> <fct>
#> 1 100 a1 b1 'woman'
#> 2 200 a2 b2 'man'
#> 3 300 a3 b3 'man'
#> 4 400 a4 b4 'woman'
Created on 2018-08-31 by the reprex package (v0.2.0.9000).