Can someone please explain this summary file on my data?
Thanks!
Call:
lm(formula = Home Run Factor ~ Average Outfield Dimension,
data = Capstone_Baseball)
Residuals:
Min 1Q Median 3Q Max
-0.31584 -0.09443 -0.01813 0.11085 0.27891
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.068759 1.531011 -0.698 0.491 Average Outfield Dimension 0.005738 0.004224 1.358 0.185
Residual standard error: 0.1525 on 28 degrees of freedom
Multiple R-squared: 0.06183, Adjusted R-squared: 0.02832
F-statistic: 1.845 on 1 and 28 DF, p-value: 0.1852
The first line, Call, simply states the model that was used. In this case you are regressing "Home Run Factor" on "Average Outfield Dimension" using the Capstone_Baseball data.
The next block, Residuals give you a rough idea of the distribution. Here min and max have a similar absolute value, as do the first and third quartile. So the distribution isn't notably skewed in one direction or another.
Coefficients is in two parts. Intercept and the independent variable. To make it easier to discuss, it actually looks more like this (see reproducible example, called a reprex) for how to post examples like these.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.496e+06 3.554e+05 18.279 < 2e-16 ***
#
The intercept is where the regression line crosses the y axis, in this case round 6.5 million. The standard error is a measure of uncertainty of that estimate, the t value is a test statistic and Pr(>|t|) is the probability that the absolute t value is greater than that. You want that number to be as low a possible. 2e-16 is the smallest number floating point arithmetic can represent.
The next coefficient allows you to calculate the slope of your regression line, but look at the p-value of 0.185, which is very high. Basically, you'd expect this result 18.5% of the time simply by chance. Actually, you can see this on the bottom line F-statistic test, telling you the same thing.
If the p-value were reasonably low, say 0.05 (which is stil a one in twenty chance of being due to randomness), the R values would tell you how much of the variation in the dependent variable is due to the independent variable. In this case, not much.