Column Naming Issue using filter

Apology for not submitting Reprex. I tried but having issues with Reprex selection add-in.

Problem below with naming. Get error on 2B column name, but if I add "" like I did with 3B all is fine. Do not understand what is wrong with naming syntax. Have same issue with summation function. But will not accept name even when "" added. Thanks for any help. JD

# collect all current batters stats from  https://www.rotowire.com/baseball/stats.php
season_batters <- read_csv("all_batter.txt", col_names = TRUE) %>%
  filter(PA > 0, Tm != "TOT") 

# filter for more than 25 at bats and calculate current batting average
bat_avg <- season_batters %>%
  filter(AB > 25) %>% 
  select(Name, Tm, G, AB, R, H, 2B,"3B", HR, SO, BB, OBP, SLG, OPS, "OPS+", TB) %>%
  mutate(average = H/AB) 

OUTPUT

Rows: 1523 Columns: 30
-- Column specification ----------------------------------------------------------------------------
Delimiter: ","
chr (4): Name, Tm, Lg, Pos Summary
dbl (26): Rk, Age, G, PA, AB, R, H, 2B, 3B, HR, RBI, SB, CS, BB, SO, BA, OBP, SLG, OPS, OPS+, TB...

i Use spec() to retrieve the full column specification for this data.
i Specify the column types or set show_col_types = FALSE to quiet this message.
Error: unexpected symbol in:
" filter(AB > 25) %>%
select(Name, Tm, G, AB, R, H, 2B"

Unlike data.frames, tibbles allow you to have column names that are not valid R names. For example, names that start with numbers or have spaces in them. However, in those cases, you have to reference the name with single quotes.

I think if you put

`2B`, `3B`

Then it should work.

Thanks Arthur. Right, it does work on that section of code, but when I carry the new dataframe into my next chunk using '2B'with dplyr::summarise it errors out again. It doesn't work with plain 2B =sum(2B) either...

# Group players by team and generate a team collective batting average, hit, and run value
team_bat <- bat_avg %>%
  group_by(Tm) %>%
  summarize( H = sum(H), AB = sum(AB), R = sum(R), H = sum(H), '2B' = sum('2B')) %>%
  mutate(team_avg = H/AB) %>%
  mutate(RunHit = R/H) %>%
  mutate(runatBat = R/AB) %>%
    
  mutate(score = (RunHit + runatBat)* 1000 ) 
  #filter(score >= 1600)
  # top_n(15, score)
datatable(team_bat) # desire to filter about the highest 10 or so batting teams

OUTPUT
Error: Problem with summarise() column 2B. i 2B = sum("2B"). x invalid 'type' (character) of argument i The error occurred in group 1: Tm = "ARI". Run rlang::last_error() to see where the error occurred.

Even when I convert it to a tibble before summarise, it errors out...

Tried both and each worked on select function but not on summarise. Already tried changing name using mutate. No help. Also converted to tibble and back - no joy.

Going back to drawing board. Will probably change import function (col_names = FALSE) and add my own names.

Thanks for suggestions...

You're right on the name change. I was using mutate to change a name instead of rename. This works and allows me to summarise later:

collect all current batters stats from 2021 MLB Player Stats

season_batters <- read_csv("all_batter.txt", col_names = TRUE) %>%
filter(PA > 0, Tm != "TOT") %>%
rename(dbl = '2B', trpl = '3B')

Group players by team and generate a team collective batting average, hit, and run value

team_bat <- bat_avg %>%
group_by(Tm) %>%
summarize( H = sum(H), AB = sum(AB), R = sum(R), dbl = sum(dbl), trpl = sum(trpl), HR = sum(HR)) %>%
mutate(team_avg = H/AB) %>%
mutate(RunHit = R/H) %>%
mutate(runatBat = R/AB) %>%
mutate(score = (RunHit + runatBat)* 1000 )
#filter(score >= 1600)

top_n(15, score)

datatable(team_bat) # desire to filter about the highest 10 or so batting teamsstrong text

:smiley:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.