TransWikia.com

Classify Spanish Text into different Categories

Data Science Asked by m2rik on November 29, 2020

I want to recommend articles to users depending upon what type of article is user reading, Music, Movies, Politics, etc.
I have 3 features: Page Title, Labels, article content.

  1. I am using an API (meaning cloud) on the Page Title feature which follows a taxonomy(IAB) for categorizing the articles into different segments. But the API is not able to label everything. 50% Uncategorized.
  2. I have tried using Basic ML models Naive Bayes for categorizing the remaining 50% by training the model on the categorized set, but the accuracy is low.

*What can be done to segment these articles into different clusters or segments so that a recommendation can be given for each type of article?

    from sklearn.svm import LinearSVC
    
    text_clf = Pipeline([('vect', CountVectorizer()),
                         ('tfidf', TfidfTransformer()),
                         ('clf', LinearSVC()),
                         ])
    
    text_clf.fit(X_train, y_train)
    
    
    predicted = text_clf.predict(X_test)
    
    print(metrics.classification_report(y_test, predicted))


                             Precision Recall   F1 Score    Support
              accuracy                           0.75       422
             macro avg       0.50      0.29      0.35       422
          weighted avg       0.74      0.75      0.72       422

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP