Merging rows based on values in two columns?

Hi! I have a dataset that has several rows for each "id". I'd like to merge the rows so that all of the data for each unique "id" and "year" combination are contained in one row. Below is an example of my current data:


I am trying to get an output like this:


I'd appreciate any advice!

The following approach works for your example data. If in your full data each combination of id and year have only one numeric value for maytemp, juntemp and jultemp (like in the example data), the solution will work fine as well.

  summarise(maytemp=sum(maytemp,na.rm = T),
            juntemp=sum(juntemp,na.rm = T),
            jultemp=sum(jultemp,na.rm = T))
1 Like

You can save some lines using across()


df <- data.frame(
  stringsAsFactors = FALSE,
                id = c("A", "A", "A", "A", "A", "A", "B", "B", "B"),
              year = c(1, 1, 1, 2, 2, 2, 1, 1, 1),
           maytemp = c(20, NA, NA, 21, NA, NA, 23, NA, NA),
           juntemp = c(NA, 21, NA, NA, 20, NA, NA, 24, NA),
           jultemp = c(NA, NA, 22, NA, NA, 18, NA, NA, 25)

df %>% 
    group_by(id, year) %>% 
    summarise(across(c(maytemp:jultemp), sum, na.rm = TRUE))
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 3 x 5
#> # Groups:   id [2]
#>   id     year maytemp juntemp jultemp
#>   <chr> <dbl>   <dbl>   <dbl>   <dbl>
#> 1 A         1      20      21      22
#> 2 A         2      21      20      18
#> 3 B         1      23      24      25

Created on 2021-05-04 by the reprex package (v2.0.0)

1 Like

Yes, this works great, thanks a lot! I am marking the other answer as a solution because it requires fewer lines.

This is exactly what I needed. Many thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.