levels() returns NULL for a tibble

I noticed an odd result when I try to get the factor levels for a column in a tibble a particular way. It seemed like it should work and it indeed does work with a data.frame but not with a tibble. I am not sure if this might be intentional within the tidyverse or if I am missing something. See the reprex below that illustrates the difference using levels(df[, "x"]) with a tibble verus a data.frame

library(dplyr)
library(tibble)
df <- tibble::tribble(
         ~x,  ~y,  ~z,
        "A", "E", "I",
        "B", "F", "J",
        "C", "G", "K",
        "D", "H", "L"
        )
df <- df %>% 
  mutate(x = factor(x))
# Treated as a tibble
levels(df$x)           # This works
levels(df[, "x"])      # This does not
levels(df[["x"]])      # This works
# Treated as a data.frame
df2 <- as.data.frame(df)
levels(df2$x)          # This works
levels(df2[, "x"])     # This works too
levels(df2[["x"]])     # This works

The relevant difference between data.frame and tibble is documented here Subsetting tibbles — subsetting • tibble

1 Like

Yep! Just to have the relevant excerpt here as well:

Tibbles are quite strict about subsetting. [ always returns another tibble. Contrast this with a data frame: sometimes [ returns a data frame and sometimes it just returns a vector:

...

df2 <- tibble(x = 1:3, y = 3:1)
class(df2[, 1])
#> [1] "tbl_df"     "tbl"        "data.frame"

...
To extract a single column use [[ or $ :

class(df2[[1]])
#> [1] "integer"

(... indicates that I'm skipping parts of what's in the full documentation)

1 Like

Awesome!! Thank you both! I figured that there might be a thoughtful reason behind that decision with tibbles.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.