Data Science Asked on June 10, 2021
I have a very simple program below that builds a model using multi-output regression. Even though all the training data consists of positive float values I’m discovering that predictions made often yield negative values. How can I tell scikit to enforce a floor of 0 (or in other words not make negative predictions)? Or what other tools/libraries can I use to ensure this doesn’t occur?
import csv
from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
X = []
Y = []
results = []
with open("folder/training_data.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)
for row in reader: # each row is a list
x = row[:5]
y = row[5:]
X.append(x)
Y.append(y)
xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size=0.15)
gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)
model.fit(xtrain, ytrain)
...
prediction = model.predict([[1.0,2.0,3.0,4.0,5.0]])
# I get e.g. [[-0.2, -0.1]] back where I'd rather have [[0,0]]
The structure of the X variables take form as float values from 0.0 to 1e7 and Y target variables from 0.0 to 1e7
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP