I have a dataframe that contains coordinates describing each one gridcell, year indications and several variables which I want to include in my linear regression. I want to perform a simple linear regression for a timeseries of 30 years per gridcell (x,y coordinate pair). So far I grouped the dataset by x and y and nested all other variables so that each row contains one column each with x and y coordinate and one column containing a data rame with the independent and predictor variables and one variable indicating the year
first I load the respective dataframe "df" as rds
then I group the dataset and nest all values per coordinate pair
model<-df %>%
mutate_at('year',as.numeric) %>%
dplyr::select(-year) %>%
group_by(x,y) %>%
nest() %>%
lm(indep_var~.,data =df) #this is where it does not work.
I also tried lm(indep_var~.,data =df$data) since the column where all variables are nested in within df is called data but this does not work either. The first option gives the error Error in model.frame.default(formula = ., data = df, subset = indep_var ~ : invalid type (closure) for the variable 'data' . The second option gives the error Error in eval(predvars, data, env) : object 'x' not found
Hard to be sure because I'm not familiar with your data. But one thing I notice is that the pipe to lm will use that data frame as the first argument to lm. The first argument to lm should be a formula and the second should be the data.
If I understand what you're trying to do, this will be useful:
Sure. tidy is a function from the broom package that's a popular way to get data frames of coefficients. It's compatible with a variety of model types. coef() will give a named vector of coefficients and so tidy is a convenient alternative. When we map(model, tidy) we loop over the models we made and make a data frame of each. Then when we byspecies %>% select(Species, coef) %>% unnest(coef) %>% print() we print the data frame that includes the coefficients across the various models.