TransWikia.com

Survival Analysis: Pseudo Observation Vs Stratified Cox Regression. Which one is better?

Data Science Asked by Ajay H on January 28, 2021

I’ve been looking into the Cox Regression method for Survival Analysis in Churn Prediction. Cox regression will allow us to determine the probability that a subscriber will unsubscribe after a time $t$, defined by the hazard rate:

$$
h(t lvert X_i ) = h_0(t)expbig( boldsymbol{beta} ^Tboldsymbol{X}_{i} big)
$$

Where

  • $h_0(t)$: Baseline Hazard is a prior Probability that any customer churns at time t when all influencing factors are 0.

  • $boldsymbol{beta} in mathbb{R}^D$: Exponent of each Coefficient gives us a Hazard ratio. These should be constant w.r.t time (proportionality assumption).

  • $boldsymbol{X}in mathbb{R}^{Ntimes D}$: Set of $N$ sample customers


Problem: Proportionality Hazard Assumption: Cox regression makes an assumption that the Hazard Ratios should remain constant through time $t$. For example, for a covariate $X_1$ = “gender”, say $beta_1=1.8$. In english, it means male subscribers tend to leave the service $80%$ more than females after a time $t$. However, this $80%$ should hold for any time $t$.

This is usually an unreasonable constrain for many variables. But there are other methods that can incorporate variables that don’t follow the proportional hazards assumption.

  • stratified cox regression
  • pseudo-observations
  • cox regression with time-dependent covariates

I was just reading up on stratified cox regression. The only apparent downside here is:

  • The variables that are stratified need to be converted into categorical variables
  • The stratified categorical variables should not have too many degrees of freedom. This will lead to a LARGE number of models whose parameters need to be estimated.

Question: Is pseudo-observations similar? Does it have less/more rigid constraints? Even so, how is it’s performance considering I have copious amounts of data?

One Answer

I suggest using a model with more relaxed assumptions on proportionality of hazards. In my work I use piecewise constant hazard model, which works wonderfully. Its assumption is that the hazards are proportional in a time interval. It allows using numerical covariates with splines, and time-dependent covariates. Moreover in my experience the model is usually very well calibrated and does not overfit much.

Answered by Gino_JrDataScientist on January 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP