Input with variable length Classification problem

Question

I have a dataset with patient information with discrete labels (labels are stages of a particular disease) which needs to be predicted (Basically a classification problem).

The dataset looks something like below:

Patient#|Visit#|Other medical features related to patient and visit|label (disease stage)

So, I am interested in using patient's past visit data inorder to predict the current disease stage. But, the problem is that all the patients don't have equal number of visits. So, I can't just append the past visit information to predict the future visit label like below:

concat(Patient #n 1st visit (X = all input features)|label of this visit| Patient #n 2nd visit (X = all input features)) and then try to predict the label for 2nd visit using previous visit information.

In the above problem, the number of visit =1, but I have a varying number of visit for each patient. How can I tackle this problem?

Erwan · Answer

The goal is to predict the current stage for one patient, so for the classification part you must have a single instance for every patient. So the question is how to transform your features repeated for every visit into a fixed number of features.

The standard option is to use expert knowledge to engineer the features. For example there might be some features which are only relevant at a certain point in time and no longer relevant later, so for these you can just keep the latest value. There might be some features for which the evolution across time is significant, so for example the average increase in the past N months might make sense. The idea is to "summarize" any number of visits into a fixed number of features.

A more advanced option would be to "vectorialize" your features, i.e. generate some kind of embedding using some unsupervised deep learning method (but that's not my area of expertise).

Dirk Nachbar · Answer

For visit t for patient i I would predict $y_{it} = f(x_{it-1})$. So I would only look back to the previous visit of the patient.
Alternatively you can use windowing functions over past visits, you can keep the mean or max x etc.

Input with variable length Classification problem

2 Answers

Add your own answers!

Ask a Question