Hey there,
I'm new to the R-Community so hopefully this is the right place to ask the question.
Currently I'm working on a medical database. Checking on the performance of several scores to predict a disease. In the dataset I got the variables
- disease: either "true" or "false"
- several scores (called "cutoff"): either 1 (= predicted false) or 2 (= predicted true)
I simply wanted to compute sensitivity, specifity and so on for every score. Computing them was pretty easy but I wanted a table like:
sens spec
cutoff1 0.6 0.5
cutoff2 0.7 0.4
cutoff3 0.8 0.6
as a result.
In additon the code should be able to take new scores (cutoffs) into account that might appear in the future.
So after tinkering around I decided to create a function based an add_column and add_row that takes 2 arguments
- the database to be analyzed
- a sequence of strings where every element is the name of the score (cutoff).
This finally led to my function:
# Step1: create an empty data frame
sensitivity_function <- function(db,cutoffs){
# create the table that will be the output
table <- data.frame() %>% add_column(
cutoff_name = 0,
true_pos = 0,
false_neg = 0,
false_pos = 0,
true_neg = 0,
cond_pos = 0,
pred_pos = 0,
cond_neg = 0,
pred_neg = 0,
sensitivity = 0,
specifity = 0,
false_pos_rate = 0,
false_neg_rate = 0
)
# iterate over all cutoffs
for(cutoff in cutoffs){
# create the variables I need
db2 <- db %>% summarize(
true_pos = sum(disease == "true" & as.numeric(.data[[cutoff]]) == 2),
false_neg = sum(disease == "true" & as.numeric(.data[[cutoff]]) == 1),
false_pos = sum(disease != "true" & as.numeric(.data[[cutoff]]) == 2),
true_neg = sum(disease != "true" & as.numeric(.data[[cutoff]]) == 1),
cond_pos = sum(disease == "true"),
pred_pos = sum(as.numeric(.data[[cutoff]]) == 2),
cond_neg = sum(disease != "true"),
pred_neg = sum(as.numeric(.data[[cutoff]]) == 1),
sensitivity = round(true_pos / cond_pos, 2),
specifity = round(true_neg / cond_neg, 2),
false_pos_rate = round(false_pos / cond_neg, 2),
false_neg_rate = round(false_neg / cond_pos, 2)
)
# add those values for the current cutoff to the table
table <- table %>% add_row(
cutoff_name = cutoff,
true_pos = db2$true_pos[1],
false_neg = db2$false_neg[1],
false_pos = db2$false_pos[1],
true_neg = db2$true_neg[1],
cond_pos = db2$cond_pos[1],
pred_pos = db2$pred_pos[1],
cond_neg = db2$cond_neg[1],
pred_neg = db2$pred_neg[1],
sensitivity = db2$sensitivity[1],
specifity = db2$specifity[1],
false_pos_rate = db2$false_pos_rate[1],
false_neg_rate = db2$false_neg_rate[1]
)
}
# put out the table
table
}
# example how to run it
cutoffs <- c("cutoff1","cutoff2","cutoff3")
table <- sensitivity_function(db,cutoffs)
table[c(1,seq(10,13))]
gives
cutoff_name sensitivity specifity false_pos_rate false_neg_rate
1 cutoff1 0.82 0.69 0.31 0.18
2 cutoff2 0.81 0.71 0.29 0.19
3 cutoff3 0.73 0.77 0.23 0.27
Is there an easier option to do it?
Thank you!