Enforce Floor limit when predicting values using Multioutput Regression with Gradient Booster

Question

I have a very simple program below that builds a model using multi-output regression. Even though all the training data consists of positive float values I'm discovering that predictions made often yield negative values. How can I tell scikit to enforce a floor of 0 (or in other words not make negative predictions)? Or what other tools/libraries can I use to ensure this doesn't occur?
import csv
from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split

X = []
Y = []

results = []
with open("folder/training_data.csv") as csvfile:
    reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)
    for row in reader:  # each row is a list
        x = row[:5]
        y = row[5:]
        X.append(x)
        Y.append(y)

xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size=0.15)

gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)

model.fit(xtrain, ytrain)

...

prediction = model.predict([[1.0,2.0,3.0,4.0,5.0]])

# I get e.g. [[-0.2, -0.1]] back where I'd rather have [[0,0]]

The structure of the X variables take form as float values from 0.0 to 1e7 and Y target variables from 0.0 to 1e7

Enforce Floor limit when predicting values using Multioutput Regression with Gradient Booster

Add your own answers!

Ask a Question