Plotting multiple curves with cox proportional hazards and binary factor predictor

I'm working on some survival analysis using a cox model with a single variable. I would like to plot the survival curves of the two but am struggling.


library(tidyverse)
library(survival)
library(survminer)

set.seed(123)
pdata <- data.frame(
  time = rexp(200, rate = 0.1),  # Survival times
  status = sample(0:1, 200, replace = TRUE),  # Censoring indicator
  sub_plan = factor(sample(c("pro", "standard"), 200, replace = TRUE))  # Subscription plans
)

fit.cox <- coxph(Surv(time, status) ~ sub_plan, data = pdata)
surv_estimates <- survfit(fit.cox)

ggsurvplot(
  surv_estimates,
  data = pdata,
  pval = FALSE,
  conf.int = FALSE,
  palette = c("#E7B800", "#2E9FDF"),
  xlab = "Time",
  ylab = "Survival Probability",
  ggtheme = theme_minimal()
)

This results in the following plot:

Hoped/expected/wouldlike two curves but just get a single one, one for sub_plan 'pro' and one for 'standard'. Can I do this using my fit and ggsurvplot?

What happens if you add color = sub_plan as one of the arguments to ggsurvplot()?

I was able to get this working with some help from a datacamp lesson:

fit.cox <- coxph(Surv(time, status) ~ sub_plan, data = pdata)
new_pdata <- pdata |> distinct(sub_plan)
surv_estimates <- survfit(fit.cox, data = pdata, newdata = new_pdata, conf.type = "none")

ggsurvplot(
  surv_estimates,
  data = pdata,
  pval = FALSE,
  conf.int = FALSE,
  palette = c("#E7B800", "#2E9FDF"),
  xlab = "Time",
  ylab = "Survival Probability",
  ggtheme = theme_minimal()
)

So surv_estimates has to be passed the fitted model, existing data pdata and also new_data with the distinct values of sub_plan

I tried adding color = sub_plan but ggsurvplot complained about not recognizing sub_plan. Anyway, was able to get it working, see solution. Thanks all the same!

You shouldn't have to use newdata when you have a single table, and I would think those curves look suspiciously similar. Using the coxph() function results in a loss of the sub_plan identifier, but even so, the strata can be preserved through the strata() function, like this:

library(tidyverse)
library(survival)
library(survminer)

set.seed(123)
pdata <- data.frame(
  time = rexp(200, rate = 0.1),  # Survival times
  status = sample(0:1, 200, replace = TRUE),  # Censoring indicator
  sub_plan = factor(sample(c("pro", "standard"), 200, replace = TRUE))  # Subscription plans
)

fit.cox <- coxph(Surv(time, status) ~ strata(sub_plan), data = pdata)
surv_estimates <- survfit(fit.cox)
ggsurvplot(
  surv_estimates,
  data = pdata,
  pval = FALSE,
  conf.int = FALSE,
  palette = c("#E7B800", "#2E9FDF"),
  xlab = "Time",
  ylab = "Survival Probability",
  ggtheme = theme_minimal()
)

Created on 2024-05-11 with reprex v2.0.2

1 Like

Interesting. Thanks for this. With your model:


 fit.cox |> summary()
Call:  coxph(formula = Surv(time, status) ~ strata(sub_plan), data = pdata)

Null model
  log likelihood= -349.3911 
  n= 200 

With the model not using strata():


fit.cox |> summary()
Call:
coxph(formula = Surv(time, status) ~ sub_plan, data = pdata)

  n= 200, number of events= 96 

                    coef exp(coef) se(coef)      z Pr(>|z|)
sub_planstandard -0.2423    0.7848   0.2059 -1.177    0.239

                 exp(coef) exp(-coef) lower .95 upper .95
sub_planstandard    0.7848      1.274    0.5242     1.175

Concordance= 0.527  (se = 0.03 )
Likelihood ratio test= 1.39  on 1 df,   p=0.2
Wald test            = 1.39  on 1 df,   p=0.2
Score (logrank) test = 1.39  on 1 df,   p=0.2

It's a different model no? I do want the coefficients for sub_plan? Am I misunderstanding?

This is the effect of the coxph() function, which collapses the two subplans into a single one called "sub_planstandard" if you don't use strata() to treat the subplans as distinct. This is why you got a single curve in your original post. Without strata(), the distinction is lost.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.