TransWikia.com

Can features that are the same in every sample contribute to learning?

Data Science Asked by Kalanos on September 1, 2021

For simplicity, let’s say that I am monitoring 4 sensors for an ongoing metric.

The first column is the sensor ID and the second column is the sensor type.

[
  [
    [0, 0, 0.123],
    [1, 0, 0.456],
    [2, 1, 0.789],
    [3, 1, 0.555]
  ],
  [
    [0, 0, 0.987],
    [1, 0, 0.654],
    [2, 1, 0.321],
    [3, 1, 0.666]     
  ],
  [
    [0, 0, 0.591],
    [1, 0, 0.824],
    [2, 1, 0.760],
    [3, 1, 0.888]      
  ]
]

If the first two columns are always the same values, will a CNN or an LSTM be able to learn from these columns or are they just redundant?

In my mind, the sensor ID could correspond with a postion on the map where different metrics are observed. Or the sensor type could correspond with some sort of sensitivity in the metric. But am I just kidding myself if they are the same in every sample?

I don’t want to provide unnecessary dimensionality to the model.

One Answer

It has no value if it is same for all the training set.

Let's say, you are using a global health dataset for Life expectancy then country code can be a useful feature. It might contain hidden information.


But if you are doing the same analysis for one country e.g. India, keeping a feature country which has only one value e.g. India, will be of no use.
It will not show any variance, now the variance is shifted to States instead of Country compared to the last example.

In you data, I can see multiple Sensor Id. If very dataset will always have these 4 values and there is no explicit assumption for 3rd column which is dependent on 1st, you can remove the feature

Answered by 10xAI on September 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP