TransWikia.com

Random Forest with 2D features

Data Science Asked on November 27, 2021

I try to predict the position of a specific point (crest) in a 1D signal (elevation profile). Until now, I computed gradient at every point of my signal and combined that with additional features or heuristics to find approximate position of the expected output (position of the crest).

But there are some limits of this approach and I’ve found that ML techniques, and especially Random Forest classifiers could perform well in this kind of situation.

I would like to train my RF to find the most probable point (point_index) being the "output" based on a profile input.

Yet, I only found examples of training RF models with 1D inputs (like a time series). In my case, I have 2D input data (one signal is composed of N points with 2 features associated to each point) like the following dataframe :

   profile_index  point_index         z             z'        crest
0              0            1 -0.885429             0          false
1              0            2 -0.820151          0.02          false
2              0            3 -0.729671          -0.1           true
3              0            4 -0.649332           0.1          false
4              1            1 -0.692186             0          false
5              1            2 -0.885429           0.1           true
6              1            3 -0.820151         -0.05          false
3              1            4 -0.649332           0.2          false

I can map my data to split the dataframe for every profile, and get the output point_index as a feature, but how do I manage the fact that 2 of my features are arrays ?

Edit: here is another representation for my data

   profile_index               points_z         points_z_prime    crest_index
               0     [-0.05, ..., 2.36]        [0, ..., -0.01]            150          
               1     [-0.02, ..., 4.41]        [0, ..., -0.02]            162          

(this is probably irrelevant regarding the method, but I work with Python and scikit-learn)

One Answer

If the number of points is constant in your array you can flatten your array and use each element as a feature in your RF. I worked on a similar problem (If I understood your problem correctly) where I predict the return of a stock based on his return on a given window of a fixed number of days and I have used the RF this way and it performs pretty well.
If your number of points isn't fixed then I suggest that that you use LSTM Neural Network where you can introduce a sequence of data (could be arrays) and it can predict the output that you are looking for.

Answered by mirimo on November 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP