Data Science Asked by Lamden on January 11, 2021
I am trying to classify cars for a towing company. Junky cars earn more when sent to the junkyard, and the more valuable cars should earn more at the auction, despite the auction fee. Creating a logistic regression that takes into account Make, Model, Mileage, Year and Run status helps us improve the accuracy of which cars should go where, but a difficulty arises: Sometimes, a car that would be classified as junk can actually be an outlier, and sell for a lot of money. So to optimize our model, we don’t really care that much whether we are right or wrong on an individual car, so much as we maximize our bottom line.
All of the models I have seen (Logistic regression, RF, linear regression) make predictions on a line by line basis.
What would be a good model to try and maximize the aggregate sum of the predictions?
Below is a reprex of my data, as well as basic code I used.
What I actually tried until now is to look at past data, and classify, in hindsight, what should have been done, based on prices that were earned in the auction vs available junk prices. I then ran a glm against that classification to predict the future. As mentioned above, my code improved the accuracy of our decisions, and would have sent more cars to junk correctly, but some that we classified as junk sold for so much in the auction that it wasn’t worth sending any to junk.
What is the proper way to approach this?
cars <- structure(list(YearOfCar = c(2009L, 2009L, 2003L, 2004L),
Make = c("Hyundai", "Lexus", "Ford", "Toyota"), Model =
c("Sonata", "GS 350", "F-250 Super Duty","Camry"), PickUpState =
c("MN", "LA", "MA", "NJ"), Auction_Result = c(650,625,425, 1500),
Auction_Fee = c(144.25, 373.54, 213.5, 187), Mileage = c(116120L,
198900L, 140241L, 312927L), Runs = structure(c(1L, 1L, 1L, 2L),
.Label = c("No", "Yes"), class = "factor"), junkyard_Offer =
c(230L, 235L, 140L, 300L), Date = structure(c(17592, 17707,
17674, 17583), class = "Date")), row.names = 3:6, class =
"data.frame")
cars$hindsight <- ifelse(cars$Auction_Result-
cars$Auction_Fee>cars$junkyard_Offer,1,0)
glmodel <- glm(hindsight~Make+Model+Mileage+Runs, data = cars,
family="binomial")
prediction <- predict(glmodel, cars, type = 'response')
prediction_classifier <- if_else(prediction>.501,1,0)
cars$prediction_results <- ifelse(prediction_classifier==1,
cars$Auction_Result-cars$Auction_Fee,cars$junkyard_Offer)
Interesting problem which potentially involves many aspects of ML, here are a few thoughts:
Finally it's certainly worth investigating in the data if the valuable cars can actually be found from the features: would a human expert with only the information in the features be able to correctly classify a car? I could imagine that for instance if a particular car is valuable because it was used in a famous movie, it doesn't help to just know its model and mileage. It would also be useful to check the relation between how common a car model is and its junk/valuable status, this could be an important indicator to take into account in the model via a feature. In the most simple case, it might even be possible to detect potentially valuable cars just by looking at this indicator... in which case there's no need for ML at all.
Answered by Erwan on January 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP