# Getting mixed interpretations of the following linear regression output in R

Hello,

this is the output of a faceted scatter plot with a linear regression in each graph, aimed at studying the potential relationship between Unemployment Rate and Crime Rate for 3 different types of crimes: Anti-Social Behaviour, Theft, and Violence & Sexual Offences.

Here's the plot and its output, and my interpretation follows:

```````Call:`

`lm(formula = Crime_occurrences ~ Unemployment_rate + Crime, data = df)`

`Residuals:`

`Min 1Q Median 3Q Max`

`-20871 -6755 362 4597 32818`

`Coefficients:`

`Estimate Std. Error t value Pr(>|t|)`

`(Intercept) 71252 21686 3.286 0.00168 **`

`Unemployment_rate -3508 5327 -0.658 0.51267`

`CrimeTheft -6613 3180 -2.080 0.04169 *`

`CrimeViolence and sexual offences 5606 3180 1.763 0.08287 .`
``````

`---`

`Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

`Residual standard error: 10550 on 62 degrees of freedom`

`Multiple R-squared: 0.1972, Adjusted R-squared: 0.1584`

`F-statistic: 5.077 on 3 and 62 DF, p-value: 0.003301`

My interpretation:

• t value is extremely low, suggesting no relationship between the two variables

• `Pr(>|t|)` is <0.05 for THEFT, suggesting a relationship between that and unemployment rate

• R-squared is very low, suggesting that the linear regression line wasn't that succesful at capturing the values in the scatter plot (i.e. the residuals are all over the place, which is true; another sign of no relationship)

• F is fairly low, suggesting no relationship

• p value is <0.05, suggesting a strong relationship

My questions:

• Where is "Anti-Social Behaviour"? It's in my data frame but not in this output

• What does the p value refer to? Unemployment rate and WHICH type of Crime?

• How should I interpret the discrepancy between Theft `Pr(>|t|)` being <0.05 but t value being not only low, but even negative?

Any suggestions would be highly appreciated!

I think you are misunderstanding what adding the Crime column to the formula in lm() does. It does not replicate what you have done in the plot where Crime_occurrences is fit against the subsets of of the data for each value of Crime. Adding Crime to the lm() formula implements a possible offset in the Crime_occurrences for different values of Crime but there is still only one slope with respect to Unemployment_rate.

I made a toy data set to illustrate this. The Value does depend on X for Group A but not for Group B. However, Group B is associated with Value being 5 higher than Group A.

Fitting all of the data yields poor p and R squared values.

Adding Group to the fit still shows a large p value for X (the slope of the graph) but it shows that there is a significant effect for Group B; Value is higher when Group == B.

Fitting the subset where Group == A shows a small p value for the slope vs. X.

Fitting the subset where Group == B shows a large p value for the slope.

``````library(ggplot2)
DF <- data.frame(Group = c("A", "A", "A", "B", "B", "B"),
X = c(1,2,3,1,2,3),
Value = c(1.01, 1.95, 3.04, 7.01, 6.95, 7.04))
ggplot(DF, aes(x = X, y = Value)) + geom_point() +
geom_smooth(formula = y ~ x, method = "lm") +
facet_wrap(~ Group)
``````

``````
ggplot(DF, aes(x = X, y = Value)) + geom_point() +
geom_smooth(formula = y ~ x, method = "lm")
``````

``````
summary(lm(Value ~ X, data = DF))
#>
#> Call:
#> lm(formula = Value ~ X, data = DF)
#>
#> Residuals:
#>      1      2      3      4      5      6
#> -2.975 -2.550 -1.975  3.025  2.450  2.025
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)    3.470      3.351   1.035    0.359
#> X              0.515      1.551   0.332    0.757
#>
#> Residual standard error: 3.103 on 4 degrees of freedom
#> Multiple R-squared:  0.02681,    Adjusted R-squared:  -0.2165
#> F-statistic: 0.1102 on 1 and 4 DF,  p-value: 0.7566
summary(lm(Value ~ X + Group, data = DF))
#>
#> Call:
#> lm(formula = Value ~ X + Group, data = DF)
#>
#> Residuals:
#>      1      2      3      4      5      6
#> -0.475 -0.050  0.525  0.525 -0.050 -0.475
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)   0.9700     0.6692   1.450  0.24304
#> X             0.5150     0.2898   1.777  0.17358
#> GroupB        5.0000     0.4732  10.567  0.00181 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.5795 on 3 degrees of freedom
#> Multiple R-squared:  0.9745, Adjusted R-squared:  0.9576
#> F-statistic: 57.41 on 2 and 3 DF,  p-value: 0.004063
summary(lm(Value ~ X, data = DF, subset = Group == "A"))
#>
#> Call:
#> lm(formula = Value ~ X, data = DF, subset = Group == "A")
#>
#> Residuals:
#>      1      2      3
#>  0.025 -0.050  0.025
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.03000    0.09354  -0.321   0.8024
#> X            1.01500    0.04330  23.440   0.0271 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.06124 on 1 degrees of freedom
#> Multiple R-squared:  0.9982, Adjusted R-squared:  0.9964
#> F-statistic: 549.5 on 1 and 1 DF,  p-value: 0.02714
summary(lm(Value ~ X, data = DF, subset = Group == "B"))
#>
#> Call:
#> lm(formula = Value ~ X, data = DF, subset = Group == "B")
#>
#> Residuals:
#>      4      5      6
#>  0.025 -0.050  0.025
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  6.97000    0.09354  74.512  0.00854 **
#> X            0.01500    0.04330   0.346  0.78770
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.06124 on 1 degrees of freedom
#> Multiple R-squared:  0.1071, Adjusted R-squared:  -0.7857
#> F-statistic:  0.12 on 1 and 1 DF,  p-value: 0.7877
``````

Created on 2021-01-17 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.