Chi-square as evaluation metrics for nonlinear machine learning regression models

Question

I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks like this:

I experiment with both linear (linear regression, linear SVMs) and nonlinear models (SVMs with RBF, Random forest, Gradient boosting machines ). The models are trained using cross-validation (~1600 samples), and 25% of dataset is used for testing (~540 samples). I am using R-squared and Root Mean Square Error (RSME) to evaluate the models on test samples.  I am interested in finding an evaluation measure to compare linear models to nonlinear.

This is done for a scientific research. I was pointed out that R-square might not be appropriate measure for nonlinear models, and that Chi-Square test would be a better measure for goodness of fit.

The problem is, I am not sure what is the best way to do it. When I browse Chi-square as goodness of fit, I only get examples where Chi-square test is used to see whether some categorical samples fit a theoretical expectation, such as here. So here are my considerations/questions:

One way I could think of is to categorize predicted (continuous) values into bins, and compare predicted distribution to the ground truth distribution using Chi-Square test. But that doesn't make much sense, i.e. we have a machine learning model that perfectly predicts ground truth values 2,3, and 4, and values 5 predicts as 1, and values 1 as 5 - Chi-Square test that I propose here would reject null hypothesis, although the model is mispredicting 2 out of 5 values. 
As referred in a tutorial from USC I could use formula (1) to compute Chi-Square value, where experimentally measured quantities (xi) are my ground truth values, and hypothesized values (mui) are my predicted values. My question is, what is the variance? If we observe each value 1,2,3,4, and 5 as a distinct category, than the variance of ground truth within each category is equals to zero. Also, how one computes degree of freedom (N-r)?  
Related to the statement I am interested in finding an evaluation measure to compare linear models to nonlinear is Chi-Square test the best (or even good) choice? What I've seen so far in machine learning competitions for regression tasks, either MSE or RSME are used for evaluation.

bstrain · Answer

Use your test data to compare the predictive performance of each model.

In R you could do this like:

linear.predictions <- predict(linear.model, newdata = test.data)
nonlinear.predictions <- predict(nonlinear.model, newdata = test.data)

linear.percent.difference <- (test.data$TARGET_VARIABLE -
                              linear.predictions)  /
                              test.data$TARGET_VARIABLE

nonlinear.percent.difference <- (test.$TARGET_VARIABLE -
                                 nonlinear.predictions) /
                                 test.dtat$TARGET_VARIABLE

linear.grade <- mean(linear.percent.difference)
nonlinear.grade <- mean(nonlinear.percent.difference)

This is a pretty simple way to do it, but it is one that works for me and is easy to understand, especially if your audience is going to eye-glaze as soon as you say "Chi-square..." Get creative!

Chi-square as evaluation metrics for nonlinear machine learning regression models

One Answer

Add your own answers!

Ask a Question