Data Science Asked by taciturno on April 25, 2021
Given:
set of images Labels in string format each one. Also I’ve given a set of Categories, also in string. ($Images neq Categories $)
Goal: I need to map given labels to given categories to "squeeze" our labels set onto categories set.
Toy example: given two sets: Labels = ['apples', 'juice', 'sun', 'volleyball player', 'birds', 'trees']
and Categories = ['fruits', 'summer']
. So result will be simple dictionary dict
with elements from Categories
set:
dict['fruits'] = ['apples', 'juice', 'trees']; dict['summer'] = ['sun', 'volleyball player', 'birds']
Question: is there is a way to do that? Maybe there are many approaches with growing complexity — it would be good to find out it all.
My approaches:
Labels
, using K-means, or Agglomerative Clustering. Then take clusters centroids and assigning to each centroid closest in cosine similarity vector from Categories
. Mapped.Labels
and then just assume that every cluster is a document. Then we got a set of topics and can also assign to each topic vector the vector from Categories
. Mapped.The problems in this approaches is a word representation. I want the algorithm to be most accurate, but understand also, that some words connected with another with number of contexts and have a number of meanings. So we need embeddings. Since BERT now is most powerful and interesting idea in word representation I can use BERT and hope it will be the most powerful approach (instead of word2vec).
Few more questions(optional answering, but appreciaеted):
Since I’m not a pro in DS yet, any help/comments/hints are appreciated.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP