Hi, I am trying to summarize my data based on "year" and "site", so that there is a row for each unique site/year combination. I want to sum the "temp" variable for each year/site combination - for only those values over 30. Additionally, I want to find the mean of the "count" variable for each year/site combination for only those where the "temp" value is over 30.
I want the output to have 4 columns (site, year, hot_temp, hot_count) and 6 rows (with each site having a summarized 2018 and 2019 row). The output I get with my code below is only one row with two columns (hot_temp and hot_count).
I'd really appreciate any advice on this!
library(dplyr)
#make dataframe
site<-c("a","a","a","a","b","b","b","b","c","c","c","c")
year<-c("2018", "2018","2019","2019","2018","2018","2019","2019","2018","2018","2019","2019")
temp<-runif(12,20,42)
count<-round(runif(12,1,5))
#summarize data per site per year - for hot conditions
d<-data.frame(site,year,temp,count)
new<-d %>%
mutate(hot_temp=temp*(temp>30), #only temps >30
hot_count=case_when(temp>30~count)) %>% #counts when temps>30
group_by(site,year) %>%
summarise(hot_temp=sum(hot_temp), #sum hot temps per year per site
hot_count=mean(hot_count, na.rm = T)) #mean counts when temps over 30 per year per site