I'm currently using the "censored" package to generate survival predictions on a test data set (whilst building the models with the training set). With each of the algorithms supported by censored, I am able to create the probability of survival for each person as well as their survival time.

However, I am running into a few issues:

To create an overall curve for the test data, should I just be taking the average probability for each person at each time interval? Censored does not seem to produce the overall curve

The time predictions do not come with an indicator of whether a patient's record will be censored or not. It just gives the time. Is there a way around this?

In terms of validation, what is the most suitable metric? I understand that metrics such as MAE will not suit survival curves.

Could you be more specific by what you mean with "overall curve"? You could average over all persons in your dataset but would that tell you what you want to know?

If you want to predict whether or not a patient will be censored, wouldn't you build a model for the censoring? Typically survival models model the (time to) event, rather than the censoring.

The aim of the exercise is to train a survival model and compare the predicted survival function against the actual survival curve of the test set. We want for the predicted survival probability to be as close as reality as possible.

In censored, the predict(type = "survival") produces a survival probability for each person for the duration specified in predict(). I've defined the "overall" curve as the average probability at each time point, though I've seen that you just select the first entry of the predict() output. I'm not sure which method is correct.

In regards to the censoring - that makes sense. Thank you

Censored does not give you a survival curve in a survfit object, which is what I think prompted your question. You could predict for all persons in your (test) set at various time points and take that as an approximation of various survival curves. If you average over all those individual curves and compare that to the true survival curve, you are loosing quite a bit of information. maybe you want to consider concordance as a measure? it's not for the survival probabilities but you could use it with the time predictions. The yardstick package has Concordance correlation coefficient — ccc • yardstick

Yes - The issue here is that I can't use ggsurvplot() to generate a curve of the test data cohort as I would for non-ML related survival tasks (will survfit objects be coming to censored in the future?)

I've created predictions for all persons in the test set - I guess my question is: is there an easy/more efficient way (i.e. to not lose as much information) to create a that creates a curve for the entire test set population?

I'm currently looking at the concordance index and the Brier score as evaluation metrics, which should help with determining accuracy.