Data Science Asked by Aníbal Sánchez Numa on April 15, 2021
I’m giving my first steps with AI and Machine Learning so I have the following issue. I’m trying to predict an outcome from COVID-19 number of day vs confirmed cases using scikit-learn library. I mean, my input is the number of days since the pandemic started in my country and my output is the number of confirmed cases in that corresponding date. However both using GradientBoosting and RandomForest I get the same output values for the test values…I post below the code in Python as it is very short…
import numpy as np
from sklearn import ensemble
import pandas
datos = pandas.read_csv('covid.csv',";")
entrada = np.array(datos['ORDEN']).reshape(-1,1)
salida = datos["CASOS"]
regr = ensemble.GradientBoostingRegressor(random_state=0,n_estimators=500).fit(entrada,salida)
test = np.array([i for i in range(63,70)]).reshape(-1,1)
print(regr.predict(test))
regr = ensemble.RandomForestRegressor(random_state=0,n_estimators=500).fit(entrada,salida)
print(regr.predict(test))
My output is this:
[1782.99976513 1782.99976513 1782.99976513 1782.99976513 1782.99976513
1782.99976513 1782.99976513]
[1773.99 1773.99 1773.99 1773.99 1773.99 1773.99 1773.99]
What am I doing wrong?? Thanks in advance.
It will depend completely on your feature engineering so I can think that in this case your model is maybe only predicting the mean or median of your target.
Also, it might help try using other kinds of models since you are trying to predict the counts of an event on a determined period of time, so it might be useful to use Poisson models that are in experimental phase in sklearn, nonetheless, the documentation might help to understand how the model works
Answered by Julio Jesus on April 15, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP