# Data Science : Recent Questions and Answers (Page 3)

## How can i tell if my model is overfitting from the distribution of predicted probabilities?

all, i am training light gradient boosting and have used all of the necessary parameters to help in over fitting.i plot the predicted probabilities (i..e probabililty has cancer) distribution from...

## Analysis for basic weight training analysis?

TL;DR: I'm doing a fairly basic project which involves exercise. It seems that descriptive statistics and basic data vis (ex: line graph) would be most appropriate for this project, but...

## Can an entire data frame be used as a prediction variable?

I am attempting to use XGBoost in R to train a model that predicts a fixed number of target variables using all data from previous dates, as well as the...

## Question on ANOVA and Correlation/Association

I've been working on examining statistical relationships between variable:Pearsons, Spearman's for continuous variablesKendall's Tau, Cramer's V for ordinal/nominal variables.I know there's many more ways. Recently I read about ANOVA and...

## Is Python a viable language to do statistical analysis in?

I originally came from R, but Python seems to be the more common language these days. Ideally, I would do all my coding in Python as the syntax is...

## Comparing Dataset - Should I use the same Test dataset?

I am training ML CNN model. I want to compare different images dataset. The dataset all have different characteristics (Translated or not, Rotated or not, etc.). I do not modify...

## Choose points to maximize volume of convex hull

Suppose I have N points (labeled 1, 2, ..., k, ..., N) in D dimensions. I'd like to choose the order of points such that, after each point, the volume...

## How is the output of a maxpool layer window size=1x2 and stride=2 calculated?

I'm looking at the architecture proposed in the following paper: Baoguang Shi et al, An End-to-End Trainable Neural Network for Image-based SequenceRecognition and Its Application to...

## Generating the right target for an LSTM model

Trying to explain my question on a simplified data set.Having the following dataset: day f1 f20 0 ...

## Does it make sense to build a ROC for a decision tree where there are multiple threshold you can adjust?

I understand building a ROC curve when the output is a probability, say, from a logistic regression model. You can build a ROC curve by varying the cutoff threshold. But...