TransWikia.com

Classification algorithm that only matches trained examples

Data Science Asked by dwkd on August 8, 2021

I have 10 categorical features and a multi-class target.

Training data contains rows where the same 10 categorical features may map to a different target class.

What classification algorithm should I choose that fulfils the following criteria:

  • prediction output should only show results if the prediction input matches a row example that existed in the training data. (I don’t want to make assumptions on inputs it has never been trained on, as this model will deal with people’s lives in the real world)
  • prediction output should be targets that match exactly the prediction input, ordered by their frequency in the training data, with the highest as ‘the best match’ at the top

I realize this looks like it could be ‘easily’ solved with databases but I would like to use Classification and train a model instead of having a cumbersome db infrastructure with columns, keys, indexes and other boring things.

One Answer

Strictly speaking Machine Learning is not the answer to this problem imho, because a ML method always works by generalizing what is in the data. In other words ML methods are meant to make some guesses while statistically minimizing the risk of error: if you don't want any generalization or risk of error, you don't want ML.

There might be some symbolic methods which could do what you describe, but essentially it's just a very simple deterministic method:

  1. Obtain the frequency for each distinct instance (including label) in the training data in a map
  2. For each distinct instance (only features) keep the most frequent one and discard all the others.
  3. When applying, just look up the instance in the map and assign the corresponding label.

a cumbersome db infrastructure with columns, keys, indexes and other boring things.

There's a confusion here: the question of using a database or not is irrelevant here since it's a matter of how you store the data, it's just not related to using ML or not.

Usual advice: use the right tool for the right job... even if it's boring ;)

Answered by Erwan on August 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP