How to tidy-ily program optional keywords to lm?

tcovert · December 8, 2017, 4:30am

I am trying to write a wrapper around lm that allows the user to specify the optional keywords to lm within the formula. For example, to estimate a linear model of y on x with weights w using the dataset d, I would like for the user to be able to write newlm(y ~ x | w, d).

As you might guess, the real problem I am working on is more complicated than this, but I am actually stuck on the step of transforming newlm(y ~ x | w, d) into a call to lm(y ~ x, data = d, weights = w).

While I think I understand how to extract the w part from the formula above, using the Formula package, I can't figure out how to tell lm to use it. My best guess looks like this:

newlm <- function(f, d) {
f <- Formula(f)
f1 <- formula(f, lhs = 1, rhs = 1)
f2 <- formula(f, lhs = 0, rhs = 2)
return(lm(f1, data = d, weights = f2)
}

I'm sure that weights = f2 isn't right, but I can't quite figure out what is. quo, enquo, substitute and a bunch of other things in http://dplyr.tidyverse.org/articles/programming.html I've tried don't quite work. My guess is that this is somehow possible, but I can't figure out exactly how.

Note: the examples above don't technically have any tidy in them, but my real problem involves using the variable w inside a bunch of dplyr expressions, so I have already successfully referred to w with a !!w step in those expressions.

Thanks in advance for any suggestions.

CorradoLanera · December 8, 2017, 5:40pm

the main problem is that lm do not see the object f2 at all

here my solution basicaly taken from here and by considering that f2 do not store the weight but it is f2[[2]] that have the (unevaluated) name of the object. So to get the weights you have to eval(f2[[2]]) and to let it seen by lm you have to include it in the data object.

Finally, you obtain quite the identical object but the call.

Cheers,
Corrado

library(Formula)

# <https://forum.posit.co/t/how-to-tidy-ily-program-optional-keywords-to-lm/3484>
# [FROM] newlm(y ~ x | w, d)
# [TO]   lm(y ~ x, data = d, weights = w)


## Prepare data
#
ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
outcome <- c(ctl, trt)

w <- runif(20)
ddf <- data.frame(outcome = outcome, group = group)

## clean workspace
rm(group, outcome)

## Standard model
lm_standard <- lm(outcome ~ group, weights = w, data = ddf)

## Exploration
# f <- outcome ~ group | w
# f <- Formula(f)
# f2 <- formula(f, lhs = 0, rhs = 2)
# 
# f2
# f2[[2]]
# eval(f2[[2]])
 

## Definition of newlm()
newlm <- function(f, d) {
  f <- Formula(f)
  f1 <- formula(f, lhs = 1, rhs = 1)
  f2 <- formula(f, lhs = 0, rhs = 2)[[2]]
  d['.weights'] <- eval(f2)

  # solution taken from <https://goo.gl/rrhzxP>
  lm(f1, data = d, weights = .weights)
}

## New model
lm_new <- newlm(outcome ~ group | w, ddf)

## Check
all.equal(lm_standard, lm_new)
#> [1] "Component \"call\": target, current do not match when deparsed"
lm_standard$call
#> lm(formula = outcome ~ group, data = ddf, weights = w)
lm_new$call
#> lm(formula = f1, data = d, weights = .weights)
lm_standard$call <- NULL
lm_new$call <- NULL
identical(lm_standard, lm_new)
#> [1] TRUE