Summarizing by group based on variable values

rvrgl · February 23, 2021, 8:03pm

Hi, I am trying to summarize my data based on "year" and "site", so that there is a row for each unique site/year combination. I want to sum the "temp" variable for each year/site combination - for only those values over 30. Additionally, I want to find the mean of the "count" variable for each year/site combination for only those where the "temp" value is over 30.

I want the output to have 4 columns (site, year, hot_temp, hot_count) and 6 rows (with each site having a summarized 2018 and 2019 row). The output I get with my code below is only one row with two columns (hot_temp and hot_count).

I'd really appreciate any advice on this!

library(dplyr)

#make dataframe
site<-c("a","a","a","a","b","b","b","b","c","c","c","c")
year<-c("2018", "2018","2019","2019","2018","2018","2019","2019","2018","2018","2019","2019")
temp<-runif(12,20,42)
count<-round(runif(12,1,5))

#summarize data per site per year - for hot conditions
d<-data.frame(site,year,temp,count)

new<-d %>%
mutate(hot_temp=temp*(temp>30), #only temps >30
hot_count=case_when(temp>30~count)) %>% #counts when temps>30

group_by(site,year) %>%

summarise(hot_temp=sum(hot_temp), #sum hot temps per year per site
hot_count=mean(hot_count, na.rm = T)) #mean counts when temps over 30 per year per site

technocrat · February 23, 2021, 8:18pm

Filter first does most of this

suppressPackageStartupMessages({
  library(dplyr)
})
# make dataframe
site <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c")
year <- c("2018", "2018", "2019", "2019", "2018", "2018", "2019", "2019", "2018", "2018", "2019", "2019")
temp <- runif(12, 20, 42)
count <- round(runif(12, 1, 5))

# summarize data per site per year - for hot conditions
d <- data.frame(site, year, temp, count)

new <- d %>%
  filter(temp > 30) %>%
  group_by(site, year) %>%
  summarise(
    hot_temp  = mean(temp), # sum hot temps per year per site
    hot_count = sum(count)
  ) # mean counts when temps over 30 per year per site
#> `summarise()` has grouped output by 'site'. You can override using the `.groups` argument.

new
#> # A tibble: 6 x 4
#> # Groups:   site [3]
#>   site  year  hot_temp hot_count
#>   <chr> <chr>    <dbl>     <dbl>
#> 1 a     2018      34.3         5
#> 2 a     2019      36.6         2
#> 3 b     2018      37.1         7
#> 4 b     2019      32.4         3
#> 5 c     2018      32.5         7
#> 6 c     2019      33.2         3

Reverse the filter to pick up cases like site c 2019 to a holding object and then rbind

rvrgl · February 23, 2021, 8:39pm

That works perfectly, thanks so much!

system · March 2, 2021, 8:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.