Hi!
I'm trying to build a decision tree regressor model using a data that consists of 260 rows and 56 columns (1 index column, 1 target variable, and 54 predictors).
I have separated the data into training and testing sets for building the model using these lines.
dt_WTI <- sort(sample(nrow(DataWTI_Lag1), nrow(DataWTI_Lag1)*.8))
dt_train_WTI<-DataWTI_Lag1[dt_WTI,]
dt_test_WTI<-DataWTI_Lag1[-dt_WTI,]
Training data has 208 rows and testing has 52.
I want to build a regression model of WTI Price towards each predictor variable at a time, not all 54 at the same time to see the RMSE value and decide on the optimal lag I must choose for each predictor.
The first model I'm trying to build is price as a function of USDX. So I built it like this.
dtWTI_Lag1 <- rpart(dt_train_WTI$WTIPrice ~ dt_train_WTI$USDX)
summary(dtWTI_Lag1)
I tried to use the model for prediction by predicting the testing data using these lines below.
predictor <- as.data.frame(dt_test_WTI[,c(3)])
colnames(predictor) <- "USDX"
prediction <- predict(dtWTI_Lag1,predictor)
But a warning message showed up.
Warning message:
'newdata' had 52 rows but variables found have 208 rows
Here's the result of the prediction
1 2 3 4 5 6 7 8 9 10 11 12
48.02825 48.02825 37.75662 48.02825 48.02825 48.02825 48.02825 37.75662 37.75662 48.02825 48.02825 37.75662
13 14 15 16 17 18 19 20 21 22 23 24
37.75662 37.75662 37.75662 54.38706 54.38706 54.38706 54.38706 54.38706 58.26438 58.26438 58.26438 67.98679
25 26 27 28 29 30 31 32 33 34 35 36
51.58749 51.58749 51.58749 63.08409 51.58749 51.58749 51.58749 67.98679 67.98679 67.98679 51.58749 51.58749
37 38 39 40 41 42 43 44 45 46 47 48
67.98679 51.58749 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409
49 50 51 52 53 54 55 56 57 58 59 60
63.08409 63.08409 63.08409 63.08409 51.58749 51.58749 67.98679 67.98679 67.98679 67.98679 67.98679 67.98679
61 62 63 64 65 66 67 68 69 70 71 72
67.98679 67.98679 67.98679 67.98679 67.98679 67.98679 58.26438 67.98679 67.98679 67.98679 67.98679 67.98679
73 74 75 76 77 78 79 80 81 82 83 84
67.98679 45.54071 54.38706 54.38706 54.38706 58.26438 58.26438 58.26438 58.26438 54.38706 45.54071 58.26438
85 86 87 88 89 90 91 92 93 94 95 96
54.38706 45.54071 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706
97 98 99 100 101 102 103 104 105 106 107 108
54.38706 54.38706 54.38706 58.26438 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706
109 110 111 112 113 114 115 116 117 118 119 120
54.38706 54.38706 37.75662 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706 54.38706
121 122 123 124 125 126 127 128 129 130 131 132
45.54071 54.38706 54.38706 54.38706 54.38706 54.38706 37.75662 37.75662 54.38706 45.54071 45.54071 48.02825
133 134 135 136 137 138 139 140 141 142 143 144
37.75662 37.75662 37.75662 37.75662 37.75662 37.75662 37.75662 37.75662 37.75662 54.38706 45.54071 54.38706
145 146 147 148 149 150 151 152 153 154 155 156
54.38706 54.38706 45.54071 58.26438 67.98679 51.58749 51.58749 51.58749 51.58749 51.58749 51.58749 51.58749
157 158 159 160 161 162 163 164 165 166 167 168
51.58749 67.98679 51.58749 51.58749 51.58749 51.58749 51.58749 63.08409 63.08409 63.08409 63.08409 63.08409
169 170 171 172 173 174 175 176 177 178 179 180
63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409
181 182 183 184 185 186 187 188 189 190 191 192
63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 63.08409 51.58749 63.08409 63.08409 51.58749 51.58749
193 194 195 196 197 198 199 200 201 202 203 204
51.58749 63.08409 63.08409 51.58749 67.98679 67.98679 67.98679 67.98679 67.98679 67.98679 67.98679 58.26438
205 206 207 208
58.26438 58.26438 58.26438 58.26438
Can somebody help me on this problem? I'm fairly new in R Studio and would appreciate great help from an expert. Thank you!