I'm relatively new to R but I have a good deal of experience with other statistical packages, programming languages, etc. I've spent all day trying to figure out why I am getting this odd result and I'm really hoping it is just me missing something obvious.
I have a CSV data source that includes grade levels coded as integers. Since these are categorical, I am trying to create a factor variable with levels corresponding to the data source and labels that are more easily understood. However, it is not working at all as expected.
Here is sample code that recreates the issue:
library(dplyr)
raw_data <- as_tibble(sample.int(16, 100, replace = TRUE) - 4)
my_data <- raw_data %>%
mutate(
grade = factor(
value,
levels = c(-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
labels = c("Birth", "3K Preschool", "4K Preschool", "Kindergarten",
"1st Grade", "2nd Grade", "3rd Grade", "4th Grade",
"5th Grade", "6th Grade", "7th Grade", "8th Grade",
"9th Grade", "10th Grade", "11th Grade", "12th Grade"),
ordered = TRUE
)
)
levels(my_data$grade)
labels(my_data$grade)
When I execute the last two levels() statements, I was expecting to get the lists that I set them to, but instead I get this:
> levels(my_data$grade)
[1] "Birth" "3K Preschool" "4K Preschool" "Kindergarten" "1st Grade" "2nd Grade" "3rd Grade" "4th Grade"
[9] "5th Grade" "6th Grade" "7th Grade" "8th Grade" "9th Grade" "10th Grade" "11th Grade" "12th Grade"
> labels(my_data$grade)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20"
[21] "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40"
[41] "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60"
[61] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" "73" "74" "75" "76" "77" "78" "79" "80"
[81] "81" "82" "83" "84" "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96" "97" "98" "99" "100"
>
I really hope I'm just missing something obvious. I coded the entire project using the grade levels as integers, then realized I should really treat them as factors and ran into this problem straightaway. Thanks for any assistance you can provide.
David