Data Science Asked by Jakubee on February 18, 2021
I’m new to this community and hopefully my question will well fit in here.
As part of my undergraduate data analytics course I have choose to do the project on human activity recognition using smartphone data sets. As far as I’m concern this topic relates to Machine Learning and Support Vector Machines. I’m not well familiar with this technologies yet so I will need some help.
I have decided to follow this project idea (first project on the top)
The project goal is determine what activity a person is engaging in (e.g., WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) from data recorded by a smartphone (Samsung Galaxy S II) on the subject’s waist. Using its embedded accelerometer and gyroscope, the data includes 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz.
All the data set is given in one folder with some description and feature labels. The data is divided for ‘test’ and ‘train’ files in which data is represented in this format:
2.5717778e-001 -2.3285230e-002 -1.4653762e-002 -9.3840400e-001 -9.2009078e-001 -6.6768331e-001 -9.5250112e-001 -9.2524867e-001 -6.7430222e-001 -8.9408755e-001 -5.5457721e-001 -4.6622295e-001 7.1720847e-001 6.3550240e-001 7.8949666e-001 -8.7776423e-001 -9.9776606e-001 -9.9841381e-001 -9.3434525e-001 -9.7566897e-001 -9.4982365e-001 -8.3047780e-001 -1.6808416e-001 -3.7899553e-001 2.4621698e-001 5.2120364e-001 -4.8779311e-001 4.8228047e-001 -4.5462113e-002 2.1195505e-001 -1.3489443e-001 1.3085848e-001 -1.4176313e-002 -1.0597085e-001 7.3544013e-002 -1.7151642e-001 4.0062978e-002 7.6988933e-002 -4.9054573e-001 -7.0900265e-001
And that’s only a very small sample of what the file contain.
I don’t really know what this data represents and how can be interpreted. Also for analyzing, classification and clustering of the data, what tools will I need to use?
Is there any way I can put this data into excel with labels included and for example use R or python to extract sample data and work on this?
Any hints/tips would be much appreciated.
The data set definitions are on the page here:
Attribute Information at the bottom
or you can see inside the ZIP folder the file named activity_labels, that has your column headings inside of it, make sure you read the README carefully, it has some good info in it. You can easily bring in a .csv
file in R using the read.csv
command.
For example if you name you file samsungdata
you can open R and run this command:
data <- read.csv("directory/where/file/is/located/samsungdata.csv", header = TRUE)
Or if you are already inside of the working directory in R you can just run the following
data <- read.csv("samsungdata.csv", header = TRUE)
Where the name data
can be changed to whatever you want to call your data set.
Correct answer by MCP_infiltrator on February 18, 2021
It looks like this (or very similar data set) is used for Coursera courses. Cleaning this dataset is task for Getting and Cleaning Data, but it is also used for case study for Exploratory Data analysis. Video from this case study is available in videos for week 4 of EDA course-ware. It might help you with starting with this data.
Answered by Damian Melniczuk on February 18, 2021
This looks to me like a typical time series classification case. After cleaning and normalizing the data, you could build a simple LSTM model to learn the time series with 2 dense layers downstream to do the classification in tensorflow. Just Google time series classification and you'll get what you're looking for.
Answered by tehem on February 18, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP