new user - filtering data to apply summary stats to segments of dataset?

Hi All,
I'm really new to R and have figured out some basics. But I'm struggling to find a way to select only portions of my data to calculate summary statistics on. My data has approximately 250000 specimens which are divided into 6 size classes. I would like to be able to calculate for example the mean weight for each size class within each location separately. I thought I was on track with this code:

ddply(Sandy_Point_collapsed, .(spsize), summarise, spweight=mean(spweight))

but the output was:
spsize spweight
1 1 0.4336955
2 2 0.4336955
3 3 0.4336955
4 4 0.4336955
5 5 0.4336955
6 6 0.4336955

so it looks like is is giving me a value for each size class but it's just repeating the mean for the entire dataset each time.

any suggestions would be greatly appreciated!

This should get you started:

# Create some dummy data
d = tibble(x = rnorm(50),
           lbl = sample(LETTERS, 50, replace = TRUE))

# Calculate mean of x for group A only
d %>%
  filter(lbl == 'A') %>%
  summarise(mu_x_A = mean(x))

# Calculate mean of x for all groups
d %>%
  group_by(lbl) %>%
  summarise(mu_x = mean(x))

Then head on over to R for Data Science and get further enlightened :slightly_smiling_face:

1 Like

Thanks Leon! That R for Data Science is a great resource, thanks for pointing it out. I may get through this thesis yet!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.