Here's a simple example that illustrates my question:
df <- data.frame(y = rnorm(10), x = rnorm(10), z = sample(c("a","b"), size = 10, replace = TRUE))
Using the * operator gives me a regression of y on 1, 1[z = b], x, 1[z=b]x.
> lm(data = df, y ~ as.factor(z)*x)
Call:
lm(formula = y ~ as.factor(z) * x, data = df)
Coefficients:
(Intercept) as.factor(z)b x as.factor(z)b:x
-0.2351 0.1524 0.2309 -0.2699
I would like to regress y on 1[z = a], 1[z=a]x, 1[z=b], 1[z=b]x (with no constant term). This regression will produce the same fitted values as the one above, but the interpretation of the coefficients is different, and preferable in some cases. How can I specify the formula to do this in a single regression?
Thanks. That's not quite what I want because I also want to "remove the intercept" on the x term. So that it's za, zb, za:x, and zb:x. I guess I may have to construct them manually, as you suggest.
No because it's still "main effect" for x plus the incremental difference for group b.
Here's how you would do what I want by hand (creating new variables, which is what I was hoping to avoid).
set.seed(1)
df <- data.frame(y = rnorm(10), x = rnorm(10), z = factor(sample(c("a","b"))),
size = 10, replace = TRUE)
df$xa <- df$x * (df$z == "a")
df$xb <- df$x * (df$z == "b")
# Max's solution
> lm(data = df, y ~ 0 + x*z)
Call:
lm(formula = y ~ 0 + x * z, data = df)
Coefficients:
x za zb x:zb
-0.44849 0.24849 -0.09732 0.59642
> lm(data = df, y ~ 0 + z + xa + xb)
Call:
lm(formula = y ~ 0 + z + xa + xb, data = df)
Coefficients:
za zb xa xb
0.24849 -0.09732 -0.44849 0.14793
My desired parameterization is equivalent to running two separate regressions subset by the value of z. But often it is useful to have all of the coefficients in a single regression.
Yes, this was just intended as a simple MWE. My question is whether there's functionality within formula specification syntax that can be used to automate this in more complicated examples.