Hello! Any help with my question would be greatly appreciated, and I thank you for your time in advance. I am fairly new to R, so apologies if this is a simple question. Also, I checked for duplicate posts and found a similar post, but my question is slightly different because it deals with the data in a long format.
Here are screenshots of the data I am working with:
The columns I am interested in are 'dma' 'weekly_deaths' and 'gtrends'. I would like to regress the gtrends data onto the weekly_deaths data for each unique 'dma'. So for example, I would like to create a simple linear regression model for gtrends ~ weekly_deaths for all of the rows with a dma =1, then do the same thing for dma =2, so on and so forth. The data for each unique dma comes after each other, as seen from the second screenshot (the dma =2 data starts after the last row for dma =1 data).
There are 210 dmas total, which is why I would like a loop to do the regressions for me instead of running 210 separate ones.
Ultimately, I would like the loop to give me the linear regression coefficient, p-value, and multiple r-squared for each dma regression.
for (i in 1:160) {
reg <- lm(weekly_deaths~gtrends, data = subset(total_gtrends_deaths_df, dma==i))
}
to no avail (again, I am very new to R so apologies if these are bad attempts).
Does anyone have any suggestions? Thank you so much! I really appreciate any and all help.
I have switched to nest_by(), another "experimental " function in {dplyr}. It "returns a rowwise data frame, which makes operations on the grouped data particularly elegant."
You were actually not far off. First, specify total_gtrends_deaths_df just once at the start and let the pipe do the work. Otherwise, I think it will group the data but then use the original ungrouped data for each variable. I would include tidy() inside do() and the . at the end of lm() to pass the data to. Without the latter, you will get an error message that it cannot find your variables. I filtered to show only the slope coefficients and not the intercepts.
It is always helpful to supply a sample of your data, perhaps for two of the dma values. We cannot copy and paste data to test with from a screenshot. .