Text topic classification in tensorflow

Question

I want to create a CNN in tensorflow that does the following:
Classify a recipe headline and find out the topic. For instance super yummy cheesy cake should result in cheese cake and so on.

I thought for going with tensorflow, but need some help in getting things started.

My strategy is like that:

Normalize the headlines so cheesy becomes cheese and cheesecake becomes cheese cake for instance and so on.
Having a dataset like:

super yummy cheesecake | cheese cake 
summer strawberry cake | strawberry cake

Train the model to learn what matters for the topic and what is just additional information.

The way, the dataset is modeled, I have no static lables, as I understand. This makes things complicated, right?

As this is my first AI experiment with tensorflow, I don't really know if this will work out, or if I should go with another strategy, therefore I need your help.

Vlad-HC · Answer

To me it looks like not tensorflow task at all. At least not at the first place.

"Normalize headlines" task (lemmatization). Spacy does nice job here and it has great documentation. Here is an example, have a look at the "lemma" property.
Use the food2vec as a database of topic names.
Parse sentence via spacy and find the phrase in the food2vec. Parsing should be done not word-by-word, but by phrase: first look up 3-words phrase in the dictionary; if not found - 2-word dictionary; than 1 word.

This should be enough to solve your task.

Alexandre Passos · Answer

You can frame this as a sequence to sequence prediction model similar to translation and summarization. This neural translation with attention colab is probably a really good place to start.

Text topic classification in tensorflow

My strategy is like that:

2 Answers

Add your own answers!

Ask a Question