How to explain summary file?

nsun20 · August 2, 2019, 9:31pm

Can someone please explain this summary file on my data?
Thanks!

Call:
lm(formula = Home Run Factor ~ Average Outfield Dimension,
data = Capstone_Baseball)

Residuals:
Min 1Q Median 3Q Max
-0.31584 -0.09443 -0.01813 0.11085 0.27891

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.068759 1.531011 -0.698 0.491
Average Outfield Dimension 0.005738 0.004224 1.358 0.185

Residual standard error: 0.1525 on 28 degrees of freedom
Multiple R-squared: 0.06183, Adjusted R-squared: 0.02832
F-statistic: 1.845 on 1 and 28 DF, p-value: 0.1852

technocrat · August 2, 2019, 10:22pm

I did a blog post on this, a while back.

The first line, Call, simply states the model that was used. In this case you are regressing "Home Run Factor" on "Average Outfield Dimension" using the Capstone_Baseball data.

The next block, Residuals give you a rough idea of the distribution. Here min and max have a similar absolute value, as do the first and third quartile. So the distribution isn't notably skewed in one direction or another.

Coefficients is in two parts. Intercept and the independent variable. To make it easier to discuss, it actually looks more like this (see reproducible example, called a reprex) for how to post examples like these.

             Estimate   Std. Error t value Pr(>|t|)    
(Intercept)  6.496e+06  3.554e+05  18.279  < 2e-16 ***
#

The intercept is where the regression line crosses the y axis, in this case round 6.5 million. The standard error is a measure of uncertainty of that estimate, the t value is a test statistic and Pr(>|t|) is the probability that the absolute t value is greater than that. You want that number to be as low a possible. 2e-16 is the smallest number floating point arithmetic can represent.

The next coefficient allows you to calculate the slope of your regression line, but look at the p-value of 0.185, which is very high. Basically, you'd expect this result 18.5% of the time simply by chance. Actually, you can see this on the bottom line F-statistic test, telling you the same thing.

If the p-value were reasonably low, say 0.05 (which is stil a one in twenty chance of being due to randomness), the R values would tell you how much of the variation in the dependent variable is due to the independent variable. In this case, not much.

nsun20 · August 2, 2019, 10:58pm

Thank you so much, this was very helpful.

system · August 23, 2019, 10:58pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.