TransWikia.com

Effect Analysis with Survival Bias Data

Data Science Asked on August 12, 2021

First of all, let me define some terms:

  1. Survival Bias
    The data we collected is biased (because we only managed to recognise users who survived through the process)
  2. Landing Page
    The first website page which the visitors come from advertising campaigns
    (usually comes with Call to Action, so that the visitors will signup)
  3. Signup Rate
    number of signup of landing page visitors / total number of landing page visitors

I am currently working on analysis of effect of X(landing page load time) to Y(signup rate).

At the first glance, we can plot the histogram of X as landing page loading speed(we can make them into bins) vs Y the signup rate.

However, most probably the result will be the longer the X, the higher the Y, which doesn’t make sense, because longer page load time, will definitely decrease the rate of signup.

After some thought, I think this is because the number of visitors is already biased.
In data collection process, we only know the visitors who waited the page to be loaded completely. In other words, if someone leaves in the middle of pageload event, we have no way to recognise them. (And now, it make sense that the visitors which gone through long loading time is the ones with higher interest, because they are willing to wait). And because of this, we can see that the longer page load time users have higher signup rate.

Now,
i) Longer page load time > higher quality user in average(the one who survived) > higher signup rate
ii) Shorter page load time > lower quality user in average(basically all visitors, because page load time may only takes within 1 second) > lower signup rate

Is there a way that I can still come out the effect X on Y, despite the data is biased now.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP