TransWikia.com

What is Ground Truth

Data Science Asked by Media on January 3, 2021

In the context of Machine Learning, I have seen the term Ground Truth used a lot. I have searched a lot and found the following definition in Wikipedia:

In machine learning, the term "ground truth" refers to the accuracy of the training set’s classification for supervised learning techniques. This is used in statistical models to prove or disprove research hypotheses. The term "ground truthing" refers to the process of gathering the proper objective (provable) data for this test. Compare with gold standard.

Bayesian spam filtering is a common example of supervised learning. In this system, the algorithm is manually taught the differences between spam and non-spam. This depends on the ground truth of the messages used to train the algorithm – inaccuracies in the ground truth will correlate to inaccuracies in the resulting spam/non-spam verdicts.

The point is that I really can not get what it means. Is that the label used for each data object or the target function which gives a label to each data object, or maybe something else?

3 Answers

The ground truth is what you measured for your target variable for the training and testing examples.

Nearly all the time you can safely treat this the same as the label.

In some cases it is not precisely the same as the label. For instance if you augment your data set, there is a subtle difference between the ground truth (your actual measurements) and how the augmented examples relate to the labels you have assigned. However, this distinction is not usually a problem.

Ground truth can be wrong. It is a measurement, and there can be errors in it. In some ML scenarios it can also be a subjective measurement where it is difficult define an underlying objective truth - e.g. expert opinion or analysis, which you are hoping to automate. Any ML model you train will be limited by the quality of the ground truth used to train and test it, and that is part of the explanation on the Wikipedia quote. It is also why published articles about ML should include full descriptions of how the data was collected.

Correct answer by Neil Slater on January 3, 2021

Ground truth: That is the reality you want your model to predict.

It may have some noise but you want your model to learn the underlying pattern in data that’s causing this ground truth. Practically, your model will never be able to predict the ground truth as ground truth will also have some noise and no model gives hundred percent accuracy but you want your model to be as close as possible.

Answered by Vivek Khetan on January 3, 2021

This is a simplified explanation : Ground truth is a term used in statistics and machine learning that means checking the results of machine learning for accuracy against the real world. The term is borrowed from meteorology, where "ground truth" refers to information obtained on site.

How you get that ground truth : Many options but usually humans will evaluate each scenario and give the right labels to it. For example you can upload a group of images to trainingset.ai , label them (where is a car, where is a pedestrian and so on) and that will be your ground truth to evaluate or train your ai algorithm.

Answered by Esteban on January 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP