Calculate mean values but for data that has more the 10 measurements.

NohmaOrama · September 12, 2021, 10:17pm

Hello guys and gals. I am a student and new to R. I have this data set. I have to tidy it up with tidy verse and then calculate the mean height for each species that has at least 10 measurements.

My code so far looks like that

library(tidyverse)
library(dplyr)
biomass2015 <- read_csv(file = "biomass2015_H.csv")
biomass2015_long <- biomass2015 |>
pivot_longer(cols = c("H1","H2","H3","H4","H5","H6","H7","H8","H9","H10"), names_to = "Quadrant", values_to = "Value")

biomass2015_long_noMV <- biomass2015_long |>
drop_na(Value)

biomass2015_long_noMV |>
group_by(species) |>
summarise_at(vars(Value),
list("Mean Height" = mean))

It's for sure messy! My question is how to calculate the mean height for every species with at least 10 measurements. Any tips are more than welcome!

williaml · September 12, 2021, 10:30pm

Generate a column to count the species, then filter on that. You could use dplyr::add_tally().

NohmaOrama · September 12, 2021, 10:37pm

Hey William thank you for your reply! Can you elaborate a little on your reply?
I know how many species I got but some of them have less that 10 height measurements..

williaml · September 12, 2021, 10:50pm

Filter those out, then calculate the means. So before the group_by(species).

NohmaOrama · September 12, 2021, 11:00pm

how I do I generate a new column with the sum of each species?

williaml · September 12, 2021, 11:10pm

Sorry, dplyr::add_count(species) before the group by.

You could do it this way:

biomass2015_long_noMV %>%
  add_count(species) %>% 
  filter(n >= 10) %>% 
  group_by(species) %>%
  summarise(mean_height = mean(Value))

# showing counts
biomass2015_long_noMV %>%
  group_by(species) %>%
  summarise(mean_height = mean(Value), count = n()) %>% 
  filter(count >= 10)

NohmaOrama · September 12, 2021, 11:24pm

Thank you man ! I figured it out exactly when to posted.
My code looks like that:

biomass2015 |> 
  group_by(species) |>
  add_count(species) |>
  filter(n >= 10) |>
  summarise_at(vars(Value),
               list("Mean Height" = mean))

NohmaOrama · September 12, 2021, 11:26pm

Ohh I did not realised that your posted the code. If you have time can you elaborate on the differences between the two codes? Yours look a lot cleaner

williaml · September 12, 2021, 11:30pm

Yours is pretty much the same as my first one, except that I have used summarise() instead of summarise_at(), but this is because there is only one variable. Also check out dplyr::across() if using more than one.

The second bit of code, just moves the count in to the summarise, just to show it. It might be a tiny bit slower if you had a really large dataset.

system · September 19, 2021, 11:31pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.