I want to estimate probability of default. I developed Cox PH model with time-dependent covariates. I use coxph function from survival package for model building. Now I want to predict probability of default (event) on test set. There are several option mentioned into methodology of survival package. In particular:
which one should I use? I guess to use type="survival", that get me conditional probability of survival and then transfer it into probability of default. Is this correct method?
Also, I want to use parametric estimation of baseline hazard function and use it for prediction. Is it possible under cox regression? how to get baseline hazard from coxph object (I assume basehaz does not get really baseline hazard)?
I assume you have a time-to-event data set with 0 being those who are right-censored and 1 for those who had the event of interest. If you have built a Cox PH model such as with, coxph(Surv(time, event) ~ var, data = data)
Then predict(model, type = 'survival') will give the probability of survival for a subject given their covariate values and their follow-up time. The complement (i.e. 1 - survival) is the probability of the event of interest. However, if you want to predict the survival or risk for everyone at the same fixed time horizon, I suggest using the riskRegression package which has the predictCox() function which accepts a coxph() model and allows you much more flexible predictions at fixed time horizons.
To your second point, if you want parametric estimation of the baseline hazard, you should fit a parametric survival model, such as with survreg() or using the flexsurv package which provides very flexible parametric survival models.
Confusingly, the basehaz() function suggests by name that it returns the baseline hazard, and by the documentation suggests it returns predicted survival curve, but it actually returns the cumulative baseline hazard.
I'm not really sure whether that function can safely handle time-dependent covariates. When I am interested in predictions (e.g. hazards, probabilities) from survival models I usually turn to parametric survival models and have not really had any issues.