Data Science Asked by OldTimeRambler on February 9, 2021
I am relatively new to datascience and have a question about NBSVM. I have a two class problem and text data (headlines from the newspaper). I want to use NBSVM to predict whether a headline has the label 0 or 1.
How I understood it, how I have to proceed now:
Is this right? Please note that this is only a theoretical procedure, not an implementation.
you use sklearn "CountVectorizer" and "TfidfVectorizer" to covert the text data into vector
tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2), stop_words='english')
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['class'], random_state = 0)
# vector representations of the text
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
# Building a SVM model
svmmodel = LinearSVC().fit(X_train_tfidf, y_train)
Answered by Harish Kumar on February 9, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP