I am trying to do a loop for a regression model on energy data. There are columns for state, MSN(Type of energy) and the years from 1960-1990 in 5 year steps and 1991 to 2017 yearly. I tested my model for 1960 and it worked perfectly. However I have a problem doing the "for"-Loop to do the regressions on the columns/years 1961-2017. Heres some view on the Dataset itself:
Thats the problematic passage of the code with the model for 1960 working fine!
This is usually an indication that your data is not in a shape appropriate for analysis.
this is why spreadsheets work ok for some simple analysis but not for more complex ones.
You may find that time spent in the beginning to reshape your data can make lots of visual and modeling efforts a lot easier. You can find a lot of resources by searching for [wide versus long data in r] or some variants of that. Here is one quick thing I found that could give you a start.
I guess the code for reading and preparing the data are irrelevant for the purpose of this.
The columns "State" and "MSN" are factors. All other are numeric. I called this Dataframe "Data_Total".
Here are the libraries used in my program, I just copied all of them from my "introduction to R" class in university.
This is the regression model for 1960 that is consistent with my results for this year from excel.
#I added 1 to each of the factors to not have the problem of LN(0) which is not possible. That does
#not represent a problem for my prupose.
Model_1960 <- lm(log(X1960[which(MSN=="KILOM")]+1) ~ log(X1960[which(MSN=="PAACB")]+1) + log(X1960[which(MSN=="NGACB")]
+ X1960[which(MSN=="PQACB")] +1) + log(X1960[which(MSN=="EMACB")] + X1960[which(MSN=="ESACB")] +1) , data=Data_Total)
summary(Model_1960)
Then what I want to do is run this regression for every column (X1960 to X1990) in order to get the estimates for the respective years and safe them in a table to use them later.
#Thats the main problem I have. I want to use a "For-Loop" to run the regression model on all of the years. I have
#1960-1990 every five years as the columns of the total data set and from 1991 to 2017 yearly as in the columns.
#I also would like to store the results of the regression model of each year to later export them to excel or reuse them
#in other models.
df_list = colnames(Data_Total[3:8])
for(i in df_list){
Model_i <- lm(log([i][which(MSN=="KILOM")]+1) ~ log([i][which(MSN=="PAACB")]+1) + log([i][which(MSN=="NGACB")] + [i][which(MSN=="PQACB")] +1) + log([i][which(MSN=="EMACB")] + [i][which(MSN=="ESACB")] +1) , data=Data_Total)
summary(Model_i)
coeff <- summary(Model_i)$coefficients[1:4, 1:4]
}
I hope that this is all the information that is needed in order to reproduce my problem and hopefully resolve it!
Wrong guess. Except from a screenshot of data to copy paste from console output, you haven't changed much. We need to create that data in our environment, and you cannot expect us to type it ourselves.
Your code is supposed to be minimal. Have you used all these packages for the problem for which you created this thread?Most probably not. I guess not a single one is used, but you may have used readr which is part of tidyverse.
I disagree. Reproducible means we will have the exact same setup in our environment as you have. For this, we can guess that may be read data from CSV using read.csv or something, but there is no way for use to be sure about that.
I'm sorry that I could not meet the requirements that you requested.
As I'm totally new to R and this community I did not completely understand what you asked for.
However I tried your approach for my model and it worked out perfectly!
Thank you very much for your help and sacrificing your free time to resolve my issue!