Predict Probability of default in Cox PH model and get baseline hazard


I have couple of question regarding Cox PH model.

  1. I want to estimate probability of default. I developed Cox PH model with time-dependent covariates. I use coxph function from survival package for model building. Now I want to predict probability of default (event) on test set. There are several option mentioned into methodology of survival package. In particular:

type=c("lp", "risk", "expected", "terms", "survival")

which one should I use? I guess to use type="survival", that get me conditional probability of survival and then transfer it into probability of default. Is this correct method?

  1. Also, I want to use parametric estimation of baseline hazard function and use it for prediction. Is it possible under cox regression? how to get baseline hazard from coxph object (I assume basehaz does not get really baseline hazard)?

thanks in advance

Hi @Lev_ani,

I assume you have a time-to-event data set with 0 being those who are right-censored and 1 for those who had the event of interest. If you have built a Cox PH model such as with, coxph(Surv(time, event) ~ var, data = data)

Then predict(model, type = 'survival') will give the probability of survival for a subject given their covariate values and their follow-up time. The complement (i.e. 1 - survival) is the probability of the event of interest. However, if you want to predict the survival or risk for everyone at the same fixed time horizon, I suggest using the riskRegression package which has the predictCox() function which accepts a coxph() model and allows you much more flexible predictions at fixed time horizons.

To your second point, if you want parametric estimation of the baseline hazard, you should fit a parametric survival model, such as with survreg() or using the flexsurv package which provides very flexible parametric survival models.

Confusingly, the basehaz() function suggests by name that it returns the baseline hazard, and by the documentation suggests it returns predicted survival curve, but it actually returns the cumulative baseline hazard.

1 Like

Dear @mattwarkentin thank you very much. I have time-varying covariates in model. So, when I try to predict with predictCox I get following warning:

The current version of predictCox was not designed to handle left censoring

Can I use this function for prediction anyway?

I'm not really sure whether that function can safely handle time-dependent covariates. When I am interested in predictions (e.g. hazards, probabilities) from survival models I usually turn to parametric survival models and have not really had any issues.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.