TransWikia.com

Compare job ads with a given set of categories (which each consists of terms)

Data Science Asked by Spooz on January 18, 2021

For a recent research paper, I plan to perform the following, for which I’d kindly ask for your advice.

I obtained a set of a few thousand job ads. I now want to analyse how and whether these job ads include ‘content’ that has been previously specified in another research paper as individual ‘categories’. To make things more precise, there are about 15 existing categories, each of which contains descriptions that explain the category in 2-4 sentences.

Now, I want to understand, which and how many job ads cover the aspects described in each of the 15 categories. A result could be, for example, job ad #1 contains content that matches (or comes close) to the descriptions of categories 2, 5, 8 but misses content that would allow any reference to the remaining categories.

In case you got any references or advice how to approach this task, please let me know. I would suspect that the best approach would be a supervised learning approach.

Best,

Spooz.

One Answer

In terms of data understanding, I would recommend an unsupervised approach first. For example you could ...

  • .. perform tf/idf vectoring and build a simple tag cloud
  • .. perform Latent Dirichlet Allocation to get an overview of latent topics in your data (like clustering and interpret the categories)
  • .. train a word embedding (e.g. word2vec) on top of your data. Then perform a dimension reduction (PCA) to visually explore your data. (You could find semantically and syntactically clusters)

After the unsupervides data exploration you could perform a text classification based on classical approaches (e.g. bag of word models) or with neural nets (=supervised).

Answered by Predicted Life on January 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP