Data Science Asked by Adnos on May 20, 2021
I’m new to machine learning, and I’m currently practicing by playing around with datasets that I find on Kaggle. Currently I’m trying to predict the price of an Audi, based on the model, mileage and manufacturing year, using a slighly modified version of this set (only columns I use are model, mileage, price and year).
I have the following code written down which makes use of linear regression.
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
df = pd.read_csv('audi.csv', sep=";")
cars=pd.get_dummies(df['model'],prefix='car')
new_df=pd.concat([df,cars],axis='columns')
new_df=new_df.drop(['model'],axis='columns')
x_data=new_df.drop(['price'],axis='columns')
y_data=new_df['price']
model=LinearRegression()
model.fit(x_data,y_data)
This gives me a meagre model.score
of 0.8290666609212749.
To test for a custom car that I find on one of the many used car websites, I do the following:
#Audi A3, 42100 mileage, 2016 built
model.predict([[42100,2016,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]])
This works, but as you can see it’s a huge hassle to work with because of all the dummy variables. Is there a way of making this simpler?
As an alternative to pd.dummies() you could use sklearn.preprocessing.OneHotEncoder().The advantage here would be that after creating an encoder object you can reuse it to transform data points you have to the one hot form. You could use OneHotEncoder(categories) to manually assign categories for the encoder object or you could use OneHotEncoder.fit() to extract categories from a particular feature. You can then use OneHotEncoder.transform(feature) to transform it to one hot form. pd.dummies() does not provide the re usability part, therefore, sklearn.OneHotEncoder() makes things simpler.
Answered by Ankita Talwar on May 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP