Data Science Asked by Lossa on December 24, 2020
Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.
In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?
Hoping to get some advise:)
Cheers,
Tom
Are you sure PCA is the correct way to go? It's an analytical problem and being able to interpret the results are very important.
How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.
It's not a pure machine learning case you have here. It's a typical analytical data science problem.
If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.
I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.
Hope that gave you some hints,
Cheers
Answered by Carl Rynegardh on December 24, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP