TransWikia.com
  1. All Categories
  2. Data Science

Data Science : Recent Questions and Answers (Page 9)

Find answers to your questions about Data Science or help others by answering their Data Science questions.

Is it possible to have stratified train-test split of a set based on two columns?

Consider a dataframe that contains two columns, text and label. I can very easily create a stratified train-test split using sklearn.model_selection.train_test_split. The only thing I have to do...

Asked on 12/11/2021

1 answer

Keras: Prediction performance does not match accuracy

I am using Keras/CNN to identify plankton images collected with an in situ camera. When making confusion matrices on the test sets following training I am finding that the...

Asked on 12/09/2021

3 answer

How to utilize dictionary data set for text classification?

I have a dataset similar to newsgroup20 for classification. With the training dataset, I have a dictionary data set that explains some jargons in the training dataset. These both are...

Asked on 12/09/2021

1 answer

Using nlp to analyze accident report

I want to use Natural Language Processing to analyze traffic accident reports and from the text determine two things: Direction of vehicle travel (just compass directions like north, southeast, etc.)Vehicle...

Asked on 12/09/2021

3 answer

Build text complexity model based on complex examples

I try to build the user specific model which predicts whether arbitrary English text is complex for particular user or not. Having the complex and easy text samples allows to...

Asked on 12/09/2021

1 answer

How to restrict the columns to be passed to final classifier in PMML Pipeline

I am working on building XGBoost PMML using SKLearn and SKLearn2PMML.I am having some numerical,somecategorical and datetime columns from which i am creating new feature inside the pipeline. When...

Asked on 12/09/2021

1 answer

Logistic regression vs Random Forest on imbalanced data set

I have an imbalanced data set where positives are just 10% of the whole sample. I am using logistic regression and random forest for classification. While comparing the results of...

Asked on 12/09/2021

1 answer

Is there any research on zonal OCR / field level ORC / template OCR?

I've recently found the term "Zonal OCR": (source 1, 2, 3, 4). It seems to be essentially OCR,...

Asked on 12/09/2021

0 answer

What is the difference between ICR and OCR?

I've just found the term "Intelligent Character Recognition" (ICR) on Wikipedia and other pages. According to Wikipedia:In computer science, intelligent character...

Asked on 12/09/2021

0 answer

Pandas/Python - comparing two columns for matches not in the same row

I have this data: I wanted to compare A and B for matches not by row but rather search A0...

Asked on 12/09/2021

1 answer

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP