TransWikia.com

SMOTE and oversampling with constraints

Data Science Asked by Titus Pullo on April 3, 2021

I’m trying to apply SMOTE to a dataset that has time-constraints. I have information about users visiting a website. For some features, there are time constraints, e.g having the first visit and the last visit at the website, the first visit (timestamp) is always lower or equal than the last visit.
If I apply SMOTE(or SMOTENC for categorical), I end up having synthetic samples for which the last visit occurred before the first visit. This leads to a sample that cannot exist in the real-world, hence can affect negatively the performance of the model.
Is there a way to apply SMOTE and impose certain rules. Or in alternative, are there oversampling techniques that can deal with this problem?

One Answer

One option would be to do something more similar to bootstrapping since that would be re-sampling existing data.

Another option would be to generate extra samples then prune based on the constraints.

Answered by Brian Spiering on April 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP