TransWikia.com

Data with similar mean, min and max across all columns. What could I do to build a classifier

Data Science Asked by funkyFunk on September 3, 2021

I have a data with the following columns

    col1         col2       col3        col4        label
    7669.533073 7669.533073 7669.695497 7669.922593 1
    7669.533043 7669.533072 7669.695487 7669.922596 0

the mean across all the 50 columns are similar and also the min and maximum.

I am trying to build a classifier and the best model(random forest) is giving me a recall of .55 (doesn’t seem so good), could there be anything I am missing in this?

I have thought about normalising the data but there seems to be no need as all columns have a similar mean and std.

Is there any statistics technique I could apply to the data to help get an improved result.

Note the data is from a simulated crypto price and I am trying to predict the price movement (up or down)

2 Answers

Your datapoints are too close to each other and hence it is really tough for any ML model to learn this inputs as it doesn't know how to differentiate almost same data to 1 and 0 label. That's why the result is random and you are getting around half accuracy.

Answered by SrJ on September 3, 2021

If the data values are this close together, it's possible the slight differences in values could be due to, or at least masked by, measurement error. If this is the case, you won't be able to model the data accurately, as measurement error is typically random, not related to any label that is attached. Also curious about the high precision of the data, with 10 significant digits. Decimal side is down to the millionths column, even with data values being in the thousands.

Answered by Donald S on September 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP