TransWikia.com

Should I modify my data to reflect time dependence of my rules?

Data Science Asked by William Abma on March 11, 2021

Say I have 100 points. Each of these points have a date associated with them and have a “yes” label. I can artificially add “no” points to create more data. Now, I want, using these 100 “yes” points + X “no” points, to predict if a given set of features (a new point) will be categorized “yes” or “no”.

I can develop a model that does so.

Now, let’s imagine that some “rules of the game” can change. Something that in 1967 was a “yes” point, becomes a “no” point after a (unknown) rule change in 1989. Most factors stay the same, but one change causes the point to change classification drastically.

Can I modify my data to add more importance to recent dates (that is, duplicate recent values so that in “importance”, 1 value from 2010 equals 3 values from 1990 and 5 values from 1970 for example), because it’s a better representation of our prediction for a future point? Or is this a terrible idea?

Basically, to simplify, if my data is 5 points, labeled “1939”, “1967”, 1980″, “1982”, “2010”, can I artificially modify it to be 12 points, “1939”, “1967”, 1980″, “1980”, “1982”, “1982”, “1982”, “2010”, “2010”, “2010”, “2010”, “2010”?

Of course, chosing the “importance” of a given more recent point is pretty difficult to begin with.

One Answer

Maybe I misunderstood something, but if you have feature vector with N features, it would be ok to add one more feature reflected the time. Then there could be two instances with the same first N features, but the different (N+1)-th and as a result the answer will differ too: "yes" for one year and "no" for another. However, for some instances answer will stay the same, which is also possible. So, I guess your idea might work, if there's actually some connection and meaningful "pattern" in real-life data.

Answered by Lana on March 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP