Thread for DATA 412/612 students to practice reprexes. No need to answer them!

Thanks, everyone, for attempting reprexes, and welcome to RStudio Community! If you're still confused about reprexes, please review this week's live session for an in-depth discussion.

Now, to fix the code!

It turns out that the real issue is a quirk of base R. read.csv() returns a regular data frame, and when we try to subset a single column in a data frame, R converts the object to a vector:

x <- data.frame(a = 1:5)

#>   a
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5

x[, "a"]
#> [1] 1 2 3 4 5[, "a"])
#> [1] FALSE

ggplot() expects a data frame, not a vector. When you're dealing with base R, the solution is to set drop = FALSE, which keeps x as a data frame.

x[, "a", drop = FALSE]
#>   a
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5[, "a", drop = FALSE])
#> [1] TRUE

Using this approach will fix our issue. I don't actually need the diabetes dataset here, so instead, I'll use the built-in cars dataset and make a histogram of the speed variable.

just_speed <- cars[, "speed", drop = FALSE]
ggplot(just_speed, aes(x = speed)) + 
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Notably, the original code used read.csv(), which returns a regular data frame, but readr::read_csv() returns a tibble, a special case of the data frame. Tibbles don't have this behavior, and subsetting them always returns a tibble:

y <- tibble(a = 1:5)
y[, "a"]
#> # A tibble: 5 x 1
#>       a
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5

Using a tibble also solves our problem:

cars <- as_tibble(cars)
just_speed <- cars[, "speed"]
ggplot(just_speed, aes(x = speed)) + 
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This is the most wholesome thread ever :orange_heart:


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.