It seem that the textbook "Machine Learning - A Probabilistic Perspective" uses input and output in a opposite way, is it?

Question

Chapter 1 of "Machine Learning - A Probabilistic Perspective" by Kevin Patrick Murphy says

We now consider unsupervised learning, where we are just given output data, without any inputs. The goal is to discover “interesting structure” in the data; this is sometimes called knowledge discovery.

post this post

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.

I saw this kind of explanation more times than the one in that book.

It seems that the book uses the terms in a opposite way, where the dataset is viewed as the output of some input, is my understanding right?

pythinker · Answer

No, it is not the case. I’m almost sure that it’s a typo and it should be changed to:

“We now consider unsupervised learning, where we are just given input data, without any outputs.”

It can be deduced by looking at the definition of supervised learning from the book:

“In this section, we discuss classification. Here the goal is to learn a mapping from inputs x to outputs y, where y ∈ {1,...,C}, with C being the number of classes.”

Flipurbit · Answer

It seems, to me, that you almost have the right picture in your head. The way that I would best describe the differences between Unsupervised/Supervised Learning, would be something like the following:

1. Supervised Learning

Aims to take a very specific dataset (usually tailored by the researcher/programmer/whomever for the specific task; also called Cleaning the Dataset) as an input, with which the machine learning model will then perform it's Training Process. This dataset is also most commonly provided along with an additional set(s) of data which is purposely made by the researcher/programmer/whomever as a source of metadata relating to the input dataset (such as tag regions for an image dataset, etc.) this is the "Supervised" part. The output would then be a model which can take data similar to the input and predict with some accuracy the metadata that should be assigned to it. (In the case of the previous example, this would be providing an image and the model would then output the tags for that image)

2. Unsupervised Learning

Is essentially the same principal but with the absence of any specially designed metadata which accompanies the input. This approach is referred to as unsupervised, because it does not require a human to assist the model in any way once the input is provided. This classification of ML models are widely considered to be a much more challenging problem; as it requires us to understand the principles of what it really means to take raw image data as an input (assuming the same example case of image tagging) and with nothing else, return all the tags that should be with an image.

It seem that the textbook "Machine Learning - A Probabilistic Perspective" uses input and output in a opposite way, is it?

2 Answers

1. Supervised Learning

2. Unsupervised Learning

Add your own answers!

Ask a Question