I'm trying to learn R to play around with various baseball-related databases. I struggle with R questions because my technical background is not data science or data wrangling, so I don't know the vocabulary, so apologies up front.
I have a data table that consists of 4 columns: playerID, yearID, AtBats, BattingAverage. There are a bunch of rows, as there is player data for each season in Major League Baseball History. Of course, BattingAverage was calculated as Hits/AtBats, but Hits aren't in my data table.
I want to construct a table of the annual weighted mean and weighted standard deviation of BattingAverage. Now, I know how to get a table of the AtBat-weighted BattingAverage:
summary <- PlayerStats %>%
group_by(yearID) %>%
summarize(y.BattingAverage = weighted.mean(BattingAverage,AtBats))
the way that I got the annual weighted variance was to first left_join the PlayerStats table with the Summary table, and then compute a different weighted.mean
PlayerStats <- Player_stats %>% left_join(Summary)
Summary <- PlayerStats %>%
group_by(year_ID) %>%
summarize(var.BattingAvg = weighted.mean((BattingAvg - y.BattingAverage)^2, AB)
And this works, but it sure seems to me that I'm doing extra steps. On top of that, my summary table lost the annual weighted batting average column, and now only contains the annual weighted variance.