How to explain summary file?

technocrat · August 2, 2019, 10:22pm

I did a blog post on this, a while back.

The first line, Call, simply states the model that was used. In this case you are regressing "Home Run Factor" on "Average Outfield Dimension" using the Capstone_Baseball data.

The next block, Residuals give you a rough idea of the distribution. Here min and max have a similar absolute value, as do the first and third quartile. So the distribution isn't notably skewed in one direction or another.

Coefficients is in two parts. Intercept and the independent variable. To make it easier to discuss, it actually looks more like this (see reproducible example, called a reprex) for how to post examples like these.

             Estimate   Std. Error t value Pr(>|t|)    
(Intercept)  6.496e+06  3.554e+05  18.279  < 2e-16 ***
#

The intercept is where the regression line crosses the y axis, in this case round 6.5 million. The standard error is a measure of uncertainty of that estimate, the t value is a test statistic and Pr(>|t|) is the probability that the absolute t value is greater than that. You want that number to be as low a possible. 2e-16 is the smallest number floating point arithmetic can represent.

The next coefficient allows you to calculate the slope of your regression line, but look at the p-value of 0.185, which is very high. Basically, you'd expect this result 18.5% of the time simply by chance. Actually, you can see this on the bottom line F-statistic test, telling you the same thing.

If the p-value were reasonably low, say 0.05 (which is stil a one in twenty chance of being due to randomness), the R values would tell you how much of the variation in the dependent variable is due to the independent variable. In this case, not much.