I am new to R, currently learning dplyr and the tidyverse, and am not sure how to resolve the following issue. If I run this code:
pacman::p_load(tidyverse, nycflights13)
data(weather)
rain_daily <- weather %>%
group_by(origin, year, month, day) %>% # drop hour as unit of observation
summarize(sum_precip_day = sum(precip)) %>% # sum hourly precip into daily
group_by(year, month, day) %>% # drop airport as unit of observation
summarize(median_precip = median(sum_precip_day)) %>% # median daily precip in NYC area
mutate(rain_any = ifelse(median_precip > 0, 1, 0)) %>% # bin. yes/no rain on a day
mutate(rain_cat = case_when(0.00 <= median_precip & median_precip < 0.01 ~ 0, # cat. no-heavy rain
0.01 <= median_precip & median_precip < 0.10 ~ 1,
0.10 <= median_precip & median_precip < 0.25 ~ 2,
0.25 <= median_precip & median_precip < 0.50 ~ 3,
0.50 <= median_precip & median_precip < 0.75 ~ 4,
0.75 <= median_precip & median_precip < 1.00 ~ 5,
1.00 <= median_precip & median_precip ~ 6)) %>%
mutate(across(c(1, 2, 3, 6), factor)) # turn year, month, day, rain_cat into factor vars
I get the following warning:
Error in mutate():
In argument: across(c(1, 2, 3, 6), factor).
Caused by error in across():
! Can't select columns past the end.
Location 6 doesn't exist.
There are only 4 columns.
If I drop the last mutate(across line, the code works and gives me 6 variables in the rain_daily. I don't know how R differentiates columns from variables. In playing around with column names versus column numbers for mutate(across, it seems that R is not recognizing the existence of my first two variables, year and month, which makes me suspect that this issue is caused by my group_by statement, but I can't figure out how to fix it. Thanks in advance!