In more formal terms, a strike is a response or outcome variable (sometimes called a dependent variable) recorded as 1 as a strike or 0 as a non-strike.

The first thing to do is to decide what, besides a ball constitutes a non-strike. A hit? Hitting the batter? Maybe the data set already has made that decision for you.

To "predict" a strike, we consider a number of treatment variables (also called independent variable). Home/away, W/L record to date for season, ERA, batter's base on balls, etc. are candidates. There's a host of other variables, some of which might actually show an association, such as the shortstop's batting average, but don't go there without some deep thinking about causal analysis.

Conventionally, the response variable is called Y and the treatments X_i \dots X_n. and the goal is to determine the conditional probability of Y given X_i \dots X_n. Let's strikes be Y and the X variables be X1 = ERA, X2 = inning, and X3 = home/away.

```
mod <- glm(Y ~ X1 + X2 + X3, data = YOUR_DATA, family="binomial")
summary(mod)
```

Or, if you've subsetted the data so that it just includes those four variables

```
mod <- glm(Y ~ ., data = YOUR_DATA, family="binomial")
summary(mod)
```

where . is everything else besides Y.

From there it can become tough sledding. As explained here logistic regression doesn't work as expected perhaps coming from linear regression. For example `geom_smooth()`

really wants something continuous to work with.

The first thing to do after downloading is the understand what all the variables measure, whether they are continuous, binary, logical, categorical or, perhaps, just comments. Then comes exploratory data analysis, where well-selected plots can help.

It's hard to suggest more without knowing where you are at in terms of statistical and R experience.