Understanding the R Formula Interface

I'm learning R and enjoying it, but I've come across the R Formula Interface and it seems to be a new language to learn. I'm a little bit stumped by it. I wonder if someone might point me at a resource that might help me understand and learn this particular dimension of R please? Can you help? Thanks in advance.

This handout will give you the details. To get a feeling, let's look at the use in linear regression.

fit1 <- lm(mpg ~ drat, data = mtcars)
fit2 <- lm(mpg ~ drat + cyl, data = mtcars)
fit3 <- lm(mpg ~ drat * cyl, data = mtcars)

fit4 <- lm(mpg ~ drat,data = mtcars)

fit4 <- lm(mpg ~ ., data = mtcars)

lm is a function given here three or four arguments. In each case mpg is the dependent variable (sometimes called the response) and data is the object in which mpg and the other variables are to be found.

In fit1, it's just the regression of mpg with one independent variable (sometimes called the treatment).

fit2 has two independent variables.

fit3 has to interacting variables (this is an intermediate level topic)

fit4 is everything, all the other variables without having to type them out.

1 Like

Thanks for your response technocrat,
I ran each of your 4 linear regressions and observed the results. I must say I hadn't deduced what "drat" referred to? It appears to be the independent variable from what I've gathered so far, but I'm not sure what it is. I did download the pdf file you listed as the handout earlier today when I was googling this topic. I didn't find his exposition to be all that enlightening unfortunately, but I thank you for referring me to it.

I'm working through a book called "R for Dummies" and the first mention of this "language" of formula "modelling" was in Chapter 13 - which meant I had to wade through a fair bit before I ran into this code. I can get the general idea that there is a language going on and I understand the notion of an independent and dependent variable in linear regression. But the use of the ~ tilde symbol and other uses of math symbols that I'm familiar with, seem to have different meanings in this formulaic structure, e.g. the * asterisk refers to variable's "crossing" and ^ symbol means something else. I found it confusing to say the least. But I'm willing to learn and I'm trying to find something that starts at a basic level and moves through at a steady incline. I appreciate your efforts, thank you,

drat is rear axle ratio.

Symbols have different meanings in different contexts. For example + usually means addition of numbers, but it can also be highjacked to mean addition of certain types of plots.

Rear axle ratio, well that clears that up. Where did you learn this Formula Interface? The best I've got at the moment is trial and error, I'll keep battling away with it. Your posts have been helpful, I want to know the full ins and outs of this language. Thanks for your help technocrat!

Well, sometime since 2007, when I started using R or maybe since 1965 when I started programming? :grin: :older_man:

1 Like

Some other resources:

1 Like

Thanks for your response. Much appreciated. I shall follow through on your suggestions. Some extra resources will be helpful.

Hi @davegoodo,
You may find the following help pages useful (although the language used is pretty terse):


Thanks for your help, I'll follow through on your suggestions,

Thanks for the explanation! however I have trouble finding the difference between the fit1 and fit4 formula, I only see a space being different but I thought R ignores spaces.


The fit4 above is identical to fit1, the space is indeed irrelevant
Based on the description accompanying fit4; it was likely intended to be

fit4 <- lm(mpg ~ .,data = mtcars)
1 Like

Just so. I plead C_8H_10N_4O_2 deficiency disorder.

Hope this cup of joe treats you well.

It's valuable to notice the variability of the meanings attached to symbols. As well as the alternate usage of + that you caught, it's also highjacked in the slick {ggplot2} package to modify plot objects.

Oh thanks!, that makes sense!

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.