Running calculations on filtered data, please HELP!

A very simply data frame (labeled birdData) as I'm trying to get a good grasp on the fundamentals

bird_sp wingspan

1 sparrow 22
2 kingfisher 26
3 eagle 195
4 hummingbird 8
5 sparrow 24
6 kingfisher 23
7 eagle 201
8 hummingbird 9
9 sparrow 21
10 kingfisher 25
11 eagle 185
12 hummingbird 9

I then tried to use a pipe to filter all the values for sparrow and save the mean value of the wingspan.

sparrowsAvg <- birdData %>%
filter(bird_sp == "sparrow") %>%
mean(sparrowsAvg$wingspan)

print(sparrowsAvg)

I keep getting a return of NA. Chat GPT is telling me that since I used filter I NEED to use the summary function and then nest mean inside of it. I don't quite understand what the summary function does or why I need to use it after filtering in a pipe like this?

Can anyone provide me any insight into what exactly it is that my code is doing and why it is doing that? Furthermore can anyone explain when and why to use summary?

Any support is GREATLY appreciated!

All the pipe does is take whatever is on its left and make that the first argument for the function on its right. You code is effectively

tmp1 <- filter(birdData, bird_sp == "sparrow")
sparrowsAvg <- mean(tmp1, sparrowsAvg$wingspan)

That second line does not make sense. You don't have to use summarize, you could write

tmp1 <- filter(birdData, bird_sp == "sparrow")
sparrowsAvg <- mean(tmp1Avg$wingspan)

In that case sparrowsAvg will be a single value, a vector of length 1. Or, you could use summarize()

sparrowsAvg <-  birdData %>% filter(bird_sp == "sparrow") %>%
   summarize(Avg = mean(wingspan))

In that case, sparrowsAvg will be a tibble with one row and one column. The column will be labeled Avg. More usual would be to do

sparrowsAvg <-  birdData %>% group_by(bird_sp) %>%
   summarize(Avg = mean(wingspan))

and get a column bird_sp and the Avg column showing the mean for the given species.

I should have mentioned that in

sparrowsAvg <-  birdData %>% filter(bird_sp == "sparrow") %>%
   summarize(Avg = mean(wingspan))

the pipe is passing the output of filter() to summarize(). That tibble is used as the source of data in the calculation of Avg, so you can just use the bare column name wingspan and you don't have to write birdData$wingspaninside of summarize().

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.