Trying to run a linear model

#' ---
#' output:
#' md_document:
#' pandoc_args: [
#' '-f', 'markdown-implicit_figures',
#' '-t', 'commonmark',
#' --wrap=preserve
#' ]
#' ---

#+ reprex-setup, include = FALSE
options(tidyverse.quiet = TRUE)
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", error = TRUE)
knitr::opts_knit$set( = knitr::imgur_upload)

#+ reprex-body

Created by Paul A. Gureghian on 8/16/2018


Teams %>% filter(yearID %in% 1961:2001) %>%
mutate(HR_per_game = HR/G, R_per_game = R/G)

fit <- lm(R_per_game ~ BB, data = Teams)  

#' Created on r Sys.Date() by the reprex package (vr utils::packageVersion("reprex")).

Trying to fit a lm() on the "Teams" object, to get the coefficient of BB. getting error message: R_per_game not found. when i run head() on "Teams" i see the R_per_game column though.

Did you assigned the result of this pipe Workflow to the Teams table (or another name)?

You created the column inside this pipe workflow with a mutate. But if you want to use inside lm not in a piped step, you need to create a data frame containing the column you created. In your lm call, you put data = Teams. It seems that this table does not contains the column you want.
Can you check that?

1 Like

Store "Teams" in a different variable: store <- Teams. then: lm(formula, data = store) ?

1 Like

1 Like

it worked. what is the "best practice" here? use a different variable, no pipe, then data = new variable?

1 Like

One option is to continue the pipe into the lm function, using the dot operator where you want your data to be inserted.


Teams %>% filter(yearID %in% 1961:2001) %>%
  mutate(HR_per_game = HR/G, R_per_game = R/G) %>%
  lm(R_per_game ~ BB, .)
#> Call:
#> lm(formula = R_per_game ~ BB, data = .)
#> Coefficients:
#> (Intercept)           BB  
#>    2.582098     0.003396

I think you should

  1. import your data
  2. prepare your data
  3. model
  4. Communicate

So preparing your data with the correct column to help model can be a specific step and store in another variable. pipe with . works but you mix steps.

You should give a look at

and specifically the part about Model. It could help you with good practice.

Also, if your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:


By the way, speaking of best practice, I would highly recommend that you try to keep your threads in code wherever possible (vs. screenshot) to make it easier for people to replicate your process quickly and precisely without having to retype from a picture. Including the actual code can also make it much easier to catch typos.

Two good ways to do this:

  1. Select your code in the console and copy into your post. Then highlight your code and click the </> button on the toolbar of the reply window to make it format nicely.

  2. Select your code and use the reprex package to generate your code. It looks like you gave a shot at that in the original post, but something looks off (for example most of it is not pre-formatted as code), and I'm not sure how to diagnose it. In that case #1 should still work...


I havent done a reprex in awhile and I guess im rusty on it. I only use screenshots to show what I am looking at and not necessarily for troubleshooting / reproduction.

The reprex package helps you make sure that your code contains everything you need for your example.

You can also do this without the reprex package, e.g. like so:

my_func = function(n){
n = 100
d = tibble(x = sort(my_func(n)), y = my_func(n))
d %>%
  ggplot(aes(x = x, y = y)) +
  geom_line() +
  theme_bw() +
  ggtitle("My Random Plot")

Spend a little extra time formatting and checking - It helps us help you and it makes it a lot easier for future googlers to use a thread for learning how to solve a particular challenge - And the aforementioned googlers won't find it, if you use screen shots, so please refrain unless strictly GUI oriented :slightly_smiling_face: