If you add the columns with a constant value to the group_by
, then those will be kept. SampleDate
and day
won't normally be kept in a summary data frame, because we're averaging over them (I'll show a way around this in the second example). In the example below, I needed to update your sample data frame, as it's missing some commas and the year
vector has the wrong length. Also, for constant columns, you can use a vector of length one. If you ever do need to repeat a value, you can save some typing with, for example, rep("chemx", 11)
.
library(tidyverse)
library(lubridate)
monthlymean <- data.frame(
stringsAsFactors = FALSE,
AnalyteName = "chemx",
SampleDate = c("2012-01-09","2012-08-14",
"2013-05-21","2014-09-09","2010-12-28","2013-03-28",
"2010-10-12","2010-03-14","2010-09-16","2012-01-21","2010-10-30"),
WBID = 12345,
ProjectName = "Water",
day = c(9L, 14L, 21L, 9L, 28L, 28L, 12L, 14L, 16L,21L, 30L),
month = c(1L, 8L, 5L, 9L, 12L, 3L, 10L, 3L, 9L,1L,10L),
year=c(2012,2012,2013,2014,2010,2013,2010,2010,2012,2010,2010),
Result = c(0.23, 0.56, 9.56, 3.67, 2.67, 4.67, 2.56, 3.57, 2.45, 6.92, 2.56)
)
# Get columns with constant values
vars=names(monthlymean)[sapply(monthlymean, function(x) length(unique(x))==1)]
mms <- monthlymean %>%
group_by(across(c(vars, month, year, WBID))) %>%
summarise(Result = mean(Result, na.rm=TRUE))
mms
#> # A tibble: 10 x 6
#> # Groups: AnalyteName, WBID, ProjectName, month [7]
#> AnalyteName WBID ProjectName month year Result
#> <chr> <dbl> <chr> <int> <dbl> <dbl>
#> 1 chemx 12345 Water 1 2010 6.92
#> 2 chemx 12345 Water 1 2012 0.23
#> 3 chemx 12345 Water 3 2010 3.57
#> 4 chemx 12345 Water 3 2013 4.67
#> 5 chemx 12345 Water 5 2013 9.56
#> 6 chemx 12345 Water 8 2012 0.56
#> 7 chemx 12345 Water 9 2012 2.45
#> 8 chemx 12345 Water 9 2014 3.67
#> 9 chemx 12345 Water 10 2010 2.56
#> 10 chemx 12345 Water 12 2010 2.67
If you do want to carry SampleDate
along in the summary data, one option is nesting. We do this by converting SampleDate
into a list column within summarise
. For example:
mms2 <- monthlymean %>%
group_by(across(c(vars, month, year, WBID))) %>%
summarise(Result = mean(Result, na.rm=TRUE),
SampleDate = list(SampleDate))
mms2
#> # A tibble: 10 x 7
#> # Groups: AnalyteName, WBID, ProjectName, month [7]
#> AnalyteName WBID ProjectName month year Result SampleDate
#> <chr> <dbl> <chr> <int> <dbl> <dbl> <list>
#> 1 chemx 12345 Water 1 2010 6.92 <chr [1]>
#> 2 chemx 12345 Water 1 2012 0.23 <chr [1]>
#> 3 chemx 12345 Water 3 2010 3.57 <chr [1]>
#> 4 chemx 12345 Water 3 2013 4.67 <chr [1]>
#> 5 chemx 12345 Water 5 2013 9.56 <chr [1]>
#> 6 chemx 12345 Water 8 2012 0.56 <chr [1]>
#> 7 chemx 12345 Water 9 2012 2.45 <chr [1]>
#> 8 chemx 12345 Water 9 2014 3.67 <chr [1]>
#> 9 chemx 12345 Water 10 2010 2.56 <chr [2]>
#> 10 chemx 12345 Water 12 2010 2.67 <chr [1]>
You can unnest
mm2
to show each SampleDate
explicitly. But note that now we're back to the same number of rows as the original data fram, but Result
is still the average value, and it's repeated in rows 9 and 10, which are part of the same grouping that we averaged over.
mms2 %>% unnest(SampleDate)
#> # A tibble: 11 x 7
#> # Groups: AnalyteName, WBID, ProjectName, month [7]
#> AnalyteName WBID ProjectName month year Result SampleDate
#> <chr> <dbl> <chr> <int> <dbl> <dbl> <chr>
#> 1 chemx 12345 Water 1 2010 6.92 2012-01-21
#> 2 chemx 12345 Water 1 2012 0.23 2012-01-09
#> 3 chemx 12345 Water 3 2010 3.57 2010-03-14
#> 4 chemx 12345 Water 3 2013 4.67 2013-03-28
#> 5 chemx 12345 Water 5 2013 9.56 2013-05-21
#> 6 chemx 12345 Water 8 2012 0.56 2012-08-14
#> 7 chemx 12345 Water 9 2012 2.45 2010-09-16
#> 8 chemx 12345 Water 9 2014 3.67 2014-09-09
#> 9 chemx 12345 Water 10 2010 2.56 2010-10-12
#> 10 chemx 12345 Water 10 2010 2.56 2010-10-30
#> 11 chemx 12345 Water 12 2010 2.67 2010-12-28