Data Science Asked by Ben-Jamin-Griff on January 26, 2021
I have a dataset containing raw 3-axis accelerometer data collected from a users lower leg and I want to create a classification model (as simple as possible) that detects if the user is sat down or standing. I have around 50,000 events (with more coming in) and these events are labeled with the correct posture and have the raw acceleration for the duration of the posture. I’ve created a handful of features from the raw data (e.g signal mean, range, frequency content etc) for each event but none of these clearly distinguish the events from each other (when visualising the dataset).
Is there a way of automatically generating useful features from labeled raw data that enables good seperation between outcomes?
If this is not possible, is it best to create a feature set with all the features you can come up with and then try to find the ones that explain the most variance between the outcomes? How do you do this? I’ve looked at PCA and LDA but they don’t appear to ‘pick’ the best features, they just combine them to new components.
Finally, does anyone have any ideas on features that could explain the differences in standing and sitting lower leg movement. I assume that the lower leg can move in more extreme angles when sitting compared to standing but how do you describe this in features.
This is an active area of research called Human Activity Recognition. There are several public datasets available to cross-validate your methods, and you might want to start here: UCI HAR Dataset. There's a paper that accompanies the dataset that describes their preprocessing methods, so you'll want to have a look at that and see if anything helps in terms of feature engineering.
There are plenty of regression-based models that exist in the exercise science space, but I've had good results using 1D convolutional neural networks on very simply transformed data. We just downsampled the signal (by averaging) to 1 or 2 Hz and reduced the 3 axes to a single time series via euclidean vector magnitude (our subjects were performing higher-intensity activities, which often cause the sensor to rotate on their limb, changing the direction of each axis' input - reducing to EVM removes that potential confounding).
I'm not totally sure what your data looks like - do you have transitions (from sitting to standing and back)? I wouldn't use PCA (there is some argument that LDA can work if you're careful). There's something called tICA (time-series independent components analysis) but I don't think that's what you want because it's reducing the dimensionality at each time step (and each of your time steps is only 3 features). You're probably going to be going with fourier features (see the link above) or simple downsample / EVM.
Another concern here is that it's possible that the data isn't reliably separable because the intensity of each activity is low. How long is each "event" (I'm understanding each event to be a "bout" of activity with some duration)? My models have, in the past, had difficulty separating certain low-intensity activity bouts because if one person isn't moving standing up and another person isn't moving sitting down, the model can't tell 0 from 0 and is basically guessing. Hopefully there's enough signal in there to get a good result, it just depends on your experimental setup.
Answered by Matthew on January 26, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP