TransWikia.com

How to cluster government census data in order to group Metropolitan statistical areas

Data Science Asked by Snorrlaxxx on July 31, 2020

I have collected a bunch of census data from 2012 – 2018. I wanted to apply some clustering algorithms in order to compare Metropolitan statistical area (MSA’s). Ideally once I run the clustering algorithm I would like to see which MSA is comparable to another.

The features that I am choosing to govern the clustering is below:

'Bachelors+',
'Estimate  Total  $10,000 to $14,999',
'Estimate  Total  $100,000 to $124,999',
'Estimate  Total  $125,000 to $149,999',
'Estimate  Total  $15,000 to $19,999',
'Estimate  Total  $150,000 to $199,999',
'Estimate  Total  $20,000 to $24,999',
'Estimate  Total  $200,000 or more',
'Estimate  Total  $25,000 to $29,999',
'Estimate  Total  $30,000 to $34,999',
'Estimate  Total  $75,000 to $99,999',
'Median Age',
'Median Gross rent as % of household inc',
'Number of educational and health service workers',
'Number of finance and real estate workers',
'Number of people in management, business, science, and arts',
'Number of service workers',
'Number of tech workers',
'Pct Asian',
'Pct Black',
'Pct Other Race',
'Pct White',
'Total Population',
'Total Population over 25'

Now a question I have is the data I have is on the tract level for every MSA in the United States from 2012 – 2018. Would I first need to aggregate the data so that I have the above features by their associated MSA then do the clustering algorithm from there?

From there how do I identify the MSAs by cluster?

One Answer

If you want to identify the distance between MSAs. Then yes, I think it would be best to first aggregate your features such that each instance (row) represents an MSA. From there you will have an $ntimes m$ matrix where $n$ is the number of MSA, and $m$ is the number of features you end up with.

You can then apply your clustering algorithm, there are many to choose from, among my favorites I always try are:

  • K-means
  • K-nearest neighbors
  • Spectral clustering
  • DBSCAN

Others can be found here.

Once you train the clustering algorithm then you will get an associated cluster values for each of the $n$ rows in your input matrix. With this you will know what MSAs are similar in nature given the selected set of features.

Correct answer by JahKnows on July 31, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP