Rick@starz and I (Richard Careaga) invite discussion on when `lm()`

should be avoided for binary values if at all? What alternative techniques should be considered?

- Continuous against continuous: both sides of the
`lm()`

formula are continuous variables - Continuous against binary: left-hand side (
**LHS**) continuous response with a binary predictor (right hand side**RHS**) - Binary against continuous: A
**LHS**binary response with a**RHS**continuous predictor - Binary against binary: Both
**LHS**and**RHS**are binary.

As a motivating example, `mtcars`

includes four variables `mpg`

, `drat`

, `vs`

and `am`

From `help(mtcars)`

mpg Miles/(US) gallon

drat Rear axle ratio

vs Engine (0 = V-shaped, 1 = straight)

am Transmission (0 = automatic, 1 = manual)

**LHS** continuous and **RHS** continuous

In the usual case of ordinary least squares linear regression both the **LHS** response variable and the **RHS** predictor variable are continuous

```
require(ggplot2)
#> Loading required package: ggplot2
mtcars |> ggplot(aes(drat,mpg)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
#> `geom_smooth()` using formula = 'y ~ x'
```

```
(fit <- lm(mpg ~ drat, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ drat, data = mtcars)
#>
#> Coefficients:
#> (Intercept) drat
#> -7.525 7.678
summary(fit)
#>
#> Call:
#> lm(formula = mpg ~ drat, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -9.0775 -2.6803 -0.2095 2.2976 9.0225
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -7.525 5.477 -1.374 0.18
#> drat 7.678 1.507 5.096 1.78e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 4.485 on 30 degrees of freedom
#> Multiple R-squared: 0.464, Adjusted R-squared: 0.4461
#> F-statistic: 25.97 on 1 and 30 DF, p-value: 1.776e-05
par(mfrow = c(2,2))
plot(fit)
```

**LHS** continuous and **RHS** binary

```
require(ggplot2)
#> Loading required package: ggplot2
mtcars |> ggplot(aes(vs,drat)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
#> `geom_smooth()` using formula = 'y ~ x'
```

```
(fit2 <- lm(drat ~ vs, data = mtcars))
#>
#> Call:
#> lm(formula = drat ~ vs, data = mtcars)
#>
#> Coefficients:
#> (Intercept) vs
#> 3.3922 0.4671
summary(fit2)
#>
#> Call:
#> lm(formula = drat ~ vs, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.09929 -0.31472 -0.04929 0.23351 1.07071
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.3922 0.1150 29.492 <2e-16 ***
#> vs 0.4671 0.1739 2.686 0.0117 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.488 on 30 degrees of freedom
#> Multiple R-squared: 0.1938, Adjusted R-squared: 0.167
#> F-statistic: 7.214 on 1 and 30 DF, p-value: 0.01168
par(mfrow = c(2,2))
plot(fit2)
```

**RHS** binary and **RHS** continuous

```
require(ggplot2)
#> Loading required package: ggplot2
mtcars |> ggplot(aes(drat,vs)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
#> `geom_smooth()` using formula = 'y ~ x'
```

```
(fit3 <- lm(vs ~ drat, data = mtcars))
#>
#> Call:
#> lm(formula = vs ~ drat, data = mtcars)
#>
#> Coefficients:
#> (Intercept) drat
#> -1.055 0.415
summary(fit3)
#>
#> Call:
#> lm(formula = vs ~ drat, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.7834 -0.2791 -0.1754 0.4283 0.9097
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.0552 0.5617 -1.879 0.0700 .
#> drat 0.4150 0.1545 2.686 0.0117 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.46 on 30 degrees of freedom
#> Multiple R-squared: 0.1938, Adjusted R-squared: 0.167
#> F-statistic: 7.214 on 1 and 30 DF, p-value: 0.01168
par(mfrow = c(2,2))
plot(fit3)
```

**LHS** binary and **RHS** binary

```
require(ggplot2)
#> Loading required package: ggplot2
mtcars |> ggplot(aes(am,vs)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
#> `geom_smooth()` using formula = 'y ~ x'
```

```
(fit4 <- lm(vs ~ am, data = mtcars))
#>
#> Call:
#> lm(formula = vs ~ am, data = mtcars)
#>
#> Coefficients:
#> (Intercept) am
#> 0.3684 0.1700
summary(fit4)
#>
#> Call:
#> lm(formula = vs ~ am, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.5385 -0.3684 -0.3684 0.4615 0.6316
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.3684 0.1159 3.180 0.00341 **
#> am 0.1700 0.1818 0.935 0.35704
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.505 on 30 degrees of freedom
#> Multiple R-squared: 0.02834, Adjusted R-squared: -0.004049
#> F-statistic: 0.875 on 1 and 30 DF, p-value: 0.357
par(mfrow = c(2,2))
plot(fit4)
```

^{Created on 2023-04-09 with reprex v2.0.2}