Hi nirgrahamuk,
thank you very much for your elaboration on the mechanics of the nnet function. Since I am very new to R, I in fact did not know that. Now I understand, however.
I have eliminated every NA-entry in the random data and the old error message disappears. The code is
library(tidyverse)
library(lubridate)
library(readr)
library(caret)
DoseTrack <- read_delim(
"[Censored]",
delim = ";",
escape_double = TRUE,
col_types = cols(
`Study Date rounded` = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
`Exam Code` = col_character(),
`SSDE Effective Diameter Source` = col_character(),
`SSDE Effective Diameter (cm)` = col_double(),
`SSDE Coefficient` = col_double(),
`SSDE Max (mGy)` = col_double(),
`Effective Dose 103 (mSv)` = col_double(),
Pitch = col_double(),
`SSDE (mGy)` = col_double()
),
locale = locale(
decimal_mark = ",",
grouping_mark = "."
),
trim_ws = TRUE
)
DoseTrack=select(DoseTrack,-"Phantom Code", -"CTDIPhantomTypeCodeValue", -"CTDIPhantomTypeCodeMeaning", -"Dose Alarm",
-"Investigation Status", -"Investigation Comment", -"Dose Alert Reason", -"Dose Trigger Value",
-"Dose Trigger Description", -"Dose Trigger Type", -"Reject Reason Code")
DoseTrack.clean <- DoseTrack # %>%
filter(
# Keine Telemedizinischen Bilder
! str_detect(`Station Name`, "TM_") &
# Keine Localizer
`Acquisition Type` != "Constant Angle Acquisition" &
`Acquisition Type` != "Stationary Acquisition" &
# Keine Interventionen
! str_detect(`Exam Description`, "(Drainage|Punktion)") &
! str_detect(`Exam Code`, "(Punktion|Intervention)") &
! str_detect(`Protocol Name`, "Intervention")
) %>%
model_nnet <- train(
x = as.data.frame(DoseTrack.clean ),
y = DoseTrack.clean$`CTDIVol (mGy)`,
method = "nnet",
preProc = c("center", "scale"),
trControl = trainControl(
search = "random",
allowParallel = TRUE,
savePredictions = "final"
),
tuneLength = 5,
maxit = 500,
MaxNWts = 5000,
linout = TRUE,
trace = TRUE
)
warnings()
Now I have the error message:
Error in { : task 1 failed - "Replacement has 1 row, data has 0"
translated from german. I assume this has to do with the functionality of the nnet function as well, even though I don't know what to do with it.
Also, R still complains about no variation in Ordinal, Pitch and Nominal Total Collimation Width (mm). Why does that happen given that the values are different?
I have eliminated the NA-columns from the original data as well, however, it did not work there.
I did not check all the 50000 data rows for potential NA's (some NA's could still exist somewhere, whereas
the random data is completely clean), however, the first rows already have no NA. Therefore, according
to what I understand from your statements, the model should have at least some data to work with, still,
it complains about the missing MRSE values. Do I, therefore, have to get rid of every NA? That could be
the difference between the original and the random data. If so, is there an efficient way to search for them?
50000*60 entries are tough to do by hand.
All the best as well.