Unexpected results from levels() and label()

I'm relatively new to R but I have a good deal of experience with other statistical packages, programming languages, etc. I've spent all day trying to figure out why I am getting this odd result and I'm really hoping it is just me missing something obvious.

I have a CSV data source that includes grade levels coded as integers. Since these are categorical, I am trying to create a factor variable with levels corresponding to the data source and labels that are more easily understood. However, it is not working at all as expected.

Here is sample code that recreates the issue:

library(dplyr)

raw_data <- as_tibble(sample.int(16, 100, replace = TRUE) - 4)

my_data <- raw_data %>%
  mutate(
    grade = factor(
      value,
      levels = c(-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
      labels = c("Birth", "3K Preschool", "4K Preschool", "Kindergarten",
                 "1st Grade", "2nd Grade", "3rd Grade", "4th Grade",
                 "5th Grade", "6th Grade", "7th Grade", "8th Grade",
                 "9th Grade", "10th Grade", "11th Grade", "12th Grade"),
      ordered = TRUE
    )
  )

levels(my_data$grade)
labels(my_data$grade)

When I execute the last two levels() statements, I was expecting to get the lists that I set them to, but instead I get this:

> levels(my_data$grade)
 [1] "Birth"        "3K Preschool" "4K Preschool" "Kindergarten" "1st Grade"    "2nd Grade"    "3rd Grade"    "4th Grade"   
 [9] "5th Grade"    "6th Grade"    "7th Grade"    "8th Grade"    "9th Grade"    "10th Grade"   "11th Grade"   "12th Grade"  

> labels(my_data$grade)
  [1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"  "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20" 
 [21] "21"  "22"  "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33"  "34"  "35"  "36"  "37"  "38"  "39"  "40" 
 [41] "41"  "42"  "43"  "44"  "45"  "46"  "47"  "48"  "49"  "50"  "51"  "52"  "53"  "54"  "55"  "56"  "57"  "58"  "59"  "60" 
 [61] "61"  "62"  "63"  "64"  "65"  "66"  "67"  "68"  "69"  "70"  "71"  "72"  "73"  "74"  "75"  "76"  "77"  "78"  "79"  "80" 
 [81] "81"  "82"  "83"  "84"  "85"  "86"  "87"  "88"  "89"  "90"  "91"  "92"  "93"  "94"  "95"  "96"  "97"  "98"  "99"  "100"
> 

I really hope I'm just missing something obvious. I coded the entire project using the grade levels as integers, then realized I should really treat them as factors and ran into this problem straightaway. Thanks for any assistance you can provide.

David

I don't think the labels() function does what you think it does. It does not extract the labels of a factor. The following code shows, I think, that your conversion to a factor is working as expected. A value of 11 gives you a grade factor labelled as 11th Grade, and that has a numeric value of 15. Birth is the lowest level and has a value of 1, Kindergarten has a numeric value of 4, etc.

library(dplyr)

raw_data <- as_tibble(sample.int(16, 100, replace = TRUE) - 4)

my_data <- raw_data %>%
  mutate(
    grade = factor(
      value,
      levels = c(-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
      labels = c("Birth", "3K Preschool", "4K Preschool", "Kindergarten",
                 "1st Grade", "2nd Grade", "3rd Grade", "4th Grade",
                 "5th Grade", "6th Grade", "7th Grade", "8th Grade",
                 "9th Grade", "10th Grade", "11th Grade", "12th Grade"),
      ordered = TRUE
    ),
    BackValue = as.numeric(grade)
  )

levels(my_data$grade)
#>  [1] "Birth"        "3K Preschool" "4K Preschool" "Kindergarten" "1st Grade"   
#>  [6] "2nd Grade"    "3rd Grade"    "4th Grade"    "5th Grade"    "6th Grade"   
#> [11] "7th Grade"    "8th Grade"    "9th Grade"    "10th Grade"   "11th Grade"  
#> [16] "12th Grade"
head(my_data)
#> # A tibble: 6 × 3
#>   value grade      BackValue
#>   <dbl> <ord>          <dbl>
#> 1    11 11th Grade        15
#> 2    12 12th Grade        16
#> 3     3 3rd Grade          7
#> 4     1 1st Grade          5
#> 5    11 11th Grade        15
#> 6     2 2nd Grade          6

Created on 2022-11-16 with reprex v2.0.2

1 Like

Thank you! You are correct that I misunderstood the purpose of the levels() function. I really appreciate your quick response.

David

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.