Change to factor type with import interface CSV file

Yoann · August 31, 2018, 1:38pm

Hi,
I'm using Version 1.1.456 (the last one).
I would like import CSV file. I select "From Text (readr)" but I have a problem to change the type of a variable to a "factor".
The interface ask me : "Please insert a comma separated list of factors".
I write a list of factors, like "man,woman" (without quotation marks). But, it replace all values of this variable to put "NA".
It tested several possibilities : "man, woman" / " "man", "woman" " ...

How does it work ? How can I change a variable to factor ?
Thank you very much !

EconomiCurtis · August 31, 2018, 2:02pm

I am not sure I understand the issue with the data loading step,
but the import data interface step of "Please insert a comma-separated list of factors" sets what you give as factor levels for your data. If the levels supplied to match exactly a value, it's converted into an NA.

For example, note the case-sensitive error below.


library(readr)
eg1 <- read_csv(
"C1,    C2,   C3,   C4 
100,   a1,   b1,   woman 
200,   a2,   b2,   man
300,   a3,   b3,   man
400,   a4,   b4,   woman", 
  col_types = cols(
    C4 = col_factor(levels = c('woman',"Man", "other"))))
#> Warning: 2 parsing failures.
#> row # A tibble: 2 x 5 col     row col   expected           actual file         expected   <int> <chr> <chr>              <chr>  <chr>        actual 1     2 C4    value in level set man    literal data file 2     3 C4    value in level set man    literal data
eg1
#> # A tibble: 4 x 4
#>      C1 C2    C3    C4   
#>   <int> <chr> <chr> <fct>
#> 1   100 a1    b1    woman
#> 2   200 a2    b2    <NA> 
#> 3   300 a3    b3    <NA> 
#> 4   400 a4    b4    woman

Created on 2018-08-31 by the reprex package (v0.2.0.9000).

One option is to load the file without additional load as factor steps, instead loading that variable as a character. Then in a later step converting the variable into a column with as.factor.

Though factors are ordinal and the order of factors is often important.
So if the order of your factors are important, be sure to relevel them. (Here's a handy tidyverse function that can help with that https://forcats.tidyverse.org/reference/fct_relevel.html)

library(readr)
library(dplyr)
df <- read_csv(
  "C1,    C2,   C3,   C4 
   100,   a1,   b1,   'woman'
   200,   a2,   b2,   'man'
   300,   a3,   b3,   'man'
   400,   a4,   b4,   'woman'")
df <- df %>% 
  mutate(
    C4 = as.factor(C4)
  )
df
#> # A tibble: 4 x 4
#>      C1 C2    C3    C4     
#>   <int> <chr> <chr> <fct>  
#> 1   100 a1    b1    'woman'
#> 2   200 a2    b2    'man'  
#> 3   300 a3    b3    'man'  
#> 4   400 a4    b4    'woman'

Created on 2018-08-31 by the reprex package (v0.2.0.9000).

Yoann · August 31, 2018, 2:21pm

Ok, now I understand my error.
Thank you very much for your help !