Sum data in one column based on condition in another column

Hi R gurus! I need some help wrangling some data (screenshot attached to this post). Apologies, I don't even know where to begin so I don't have any R code yet...

I have a data frame of 1330 observations and I would like to sum the average column in chunks based on values in the m_off_ground column. Specifically, when a 0 occurs in the m_off_ground column, I want the sum calculation to 'restart' in the average column. Unfortunately, the 0s do not occur at regular intervals.
Furthermore, I'd like to create a column 'no_frames' which will tell me the number of columns/values included in each sum calculation.
So, the first one would be the sum of rows 2-11 in the average column (since there is a 0 in row 12 in the m_off_ground column) and the no_frames would equal 10.

I know there must be some for loop way to do this, or perhaps even a dplyr way but as a still relatively new R user, my brain can't wrap itself around how to do this....

Thanks so much for any help or advice!

It looks to me like another way to express what you want is that you want a sum of the average column for each value of the station column. Is that true?


while that is true, I sample the same 12 stations for 1 month, so those station names are repeated throughout the data frame for ~20 different dates. Sorry, I should have included that in my original post.



So you want to do the calculation for each station on each date? The reason I am pushing on this is that it is far easier to do such calculations by taking advantage of labels in other columns rather than trying to step down a column and look at patterns of values.

Yes, you're right! It would be summing the average column for each station on each date. I don't know why I didn't think of that....
Also (sorry to add another layer to this) - would there be a way to "grab" the last value in the m_off_ground column per station per date. So for the first one that would be 7.0899.

If the last value is always the largest, I would do your calculations like this (naming your data frame DF)

Stats <- DF %>% group_by(station, date) %>% 
     summarize(SumAvg = sum(average, na.rm = TRUE), MaxOffGrd = max(m_off_ground), NumFrames = n())

Wow, thank you so, so much FJCC! Such an easy and elegant way of doing that. Thanks for all your help :blush:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.