Cross Validated Asked on November 14, 2021
I am new to the field of survival analysis. I was reading about the interpretation of C-index and realized it only cares about the sequence of predictions. I was always using the sci-kit survival package and never deeply though how the C-index is calculated if the actual survival times are not predicted in Cox proportional hazard model. I would appreciate if someone simply explain this to me.
You are correct that time is not the default output of a Cox model. However, for any given unit with its covariate pattern, the model gives a relative hazard. By definition, units with higher hazard ratios should have shorter time to event. The censored c-index compares the estimated hazard ratio to both the actual event status and actual time to event (or censoring time) to produce its estimate.
Answered by Todd D on November 14, 2021
Below is my attempt to answer this question.
Concordance index is a measure of how discriminant your model is.
For survival analysis, say you have a covariate $X$ and a survival time $T$.
Assume that higher values of $X$ imply shorter value for $T$ (thus $X$ has a deleterious effect on $T$).
Discrimination means that you are able to say, with high reliability, that between two patients which one will have a shorter survival time.
For a perfectly discriminative model, if you pick two sujects at random $(X_1,T_1)$ and $(X_2,T_2)$ then the one with the largest value of $X$ will have, with probability $1$, a shorter survival time:
$$ c=mathbb P( T_1 < T_2 mid X_1 geq X_2) = 1 $$
In your dataset if you pick two patients at random, there is 4 cases:
The last case is not taken into account to estimate the concordance (at least I think so).
In case $3$, since the two patients have the same risk, the best you can do to say which one will have the shorter survival time is to toss a fair coin.
The estimated concordance index based on your data is:
$$ hat c= frac{C+frac{R}{2}}{C+D+R} $$ where $C$, $D$ are the total number of concordant, discordant couples, $R$ the total number of couple with the exact same risk. The $frac{R}{2}$ at the numerator comes from the coin toss.
When there is censoring (as often with survival data) the computation of $hat c$ is modified but the idea and interpretation of $c$ remains the same.
Example
Say you have $8$ patients with data: begin{array}{c| c|c} text{Id} & text{Time} (T) & X \ hline 1 & 1 & 1 \ 2 & 2 & 3 \ 3 & 3 & 2 \ 4 & 12 & 10 \ 5 & 17 & 15 \ 6 & 27 & 40 \ 7 & 36 & 60 \ 8 & 55 & 80 end{array}
In that case, we see that larger values of $X$ imply larger values of $T$. Thus a couple is concordant if $X_1 < X_2$ and $T_1 < T_2$.
There are $binom{8}{2}=28$ choices of couples of patients, among those only the couple $(2,3)$ is discordant (since $X_2 > X_3$ but $T_2 < T_3$). There is no couple with equal risk thus $R=0$.
Then the estimated concordance index is $frac{27}{28} approx 0.964$.
You can check this with the R package survival
(sorry I'm not used to survival analysis with Python):
require(survival)
time<-c(1,2,3,12,17,27,36,55)
X<-c(1,3,2,10,15,40,60,80)
data<-data.frame(matrix(c(time,X),ncol=2,8,byrow = F))
mod<-coxph(Surv(data[,1],rep(1,8))~data[,2])
mod$concordance #~0.964
So to answer your question about predicted times, you can see that neither the values of $T$ or $X$ change the estimation of $c$: it's only a matter of ordering between predictor and survival times. You can change the value in the previous example without breaking the number of concordant/discordant couples and still have the same estimated concordance.
In which direction should I look for the covariate $X$?
Is a couple concordant if $X_1 > X_2$ and $T_1 < T_2$ or if $X_1 < X_2$ and $T_1 < T_2$?
For the Cox model, it depends on the estimated hazard-ratio. If the ratio, $e^beta$ is $>1$ then larger values of $X$ imply larger risks thus shorter times. So if $e^beta > 1$ a couple is concordant if $X_1 > X_2$ and $T_1 < T_2$, and if $e^beta < 1$ a couple is concordant if $X_1 < X_2$ and $T_1 < T_2$.
Finally in the case of a vector of covariates, I think the procedure remain the same but instead of using the vector $X$ we use the predicted risk $hat beta X$ with $hat beta$ estimated from the Cox model.
Answered by periwinkle on November 14, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP