Data Science Asked on December 21, 2020
I want to train a binary classification algorithm for spam detection using labeled data set. The dataset has the following features:
Email address, text message (split into subject and corpus), date
An example of data is:
Email | Subject | Corpus | Date
[email protected] | Example | this is just an example of my dataset | 2020/08/20
What I would like is to transform data features in real numbers and binarize email addresses.
As algorithm I was thinking of
SVM and/or Naïve Bayes.
My difficulties are, however, in how transform data features in real numbers in order to get more parameters in my classifier.
I am using Python.
Could you please give me an example of how to do it?
The term you are looking for is text classification. There exists a huge number of tutorials and papers out there, for example this tutorial and this survey.
Answered by N. Kiefer on December 21, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP