I have a data frame with some rows duplicated by a certain column. I want to take the average in those cases, leaving a data frame with only unique values by that column.
This is just a minimal reprex - but my 'real' data is much bigger and more complex. How can I make a new data frame that only lists "blue" once, with a value of 20 (since it's the average) - and leaves everything else untouched?
This is a typical example of grouping a data frame by one (or more) columns and summarizing the data in another column.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
mydf <- data.frame("color"=c("blue", "blue", "red", "orange", "green"),
"value"=c(10,30,1,2,3))
mydf_2 <- mydf %>% group_by(color) %>% summarize(value=mean(value))
#> `summarise()` ungrouping output (override with `.groups` argument)
mydf_2
#> # A tibble: 4 x 2
#> color value
#> <chr> <dbl>
#> 1 blue 20
#> 2 green 3
#> 3 orange 2
#> 4 red 1