ML, Statistics and Mathematics

Question

I have just started getting my hands wet in ML and every time I try delving deeper into the concepts/code, I face the challenges of the mathematics and its cryptic notations.
Coming from a Computer Science background, I do understand bit of them but majority goes tangent.
Say, for example below formulae from this page -

I try and really want to understand them but somehow get confused and leave it everytime.
Can you please suggest how to start with it? Any starting pointers or advise please.

German C M · Answer

I would recommend a TOP-DOWN learning path:

get a first grasp about what algorithms types there are based on possible use cases (classification, regression, clustering, etc); this way, you know the WHAT CAN I SOLVE WITH THIS
for the algorithms you are interested in (a basic one could be a linear regression trained via gradient descent optimizer), you can get a first feeling using libraries like 
scikit-learn which wrap all the math in between, but give you results which you can quickly check and play with --> HOW CAN I SOLVE IT
after you have played around it, you can have a deeper look at how the algorithms work, with the linear algebra, statistics and calculus concepts you need to really understand them (basically, the math fomulae you said) --> HOW IT WORKS

Good sources:

Python Machine Learning book, by Sebastian Raschka (good balance between theory and practice)
Jason Brownlee blog and books (very applied use cases)
scikit-learn documentation, which includes the math used in their code

Romain Reboulleau · Answer

It is quite true that papers or books use notations that sometimes seem obvious to people who are used to dealing with the mathematical aspects, but are meaningless for the others. Ways of understanding the math include:

Following theoretical courses or trainings
Reading dedicated books
Asking people around you
Asking people on forums such as this one, or Cross Validated for stats formulae
Getting it by yourself upon re-reading parts of the paper/book you didn't get at the first time

There are some notations/conventions that are implicitly accepted in data science / machine learning papers, such as:

Using $X$ as input, $y$ as output, $theta$ as model parameters
Using $hat{y}$ for the estimator of the true $y$
Assuming that vectors are column vectors

The list would be too long to include here.

Regarding the example above, what we face is a constrained optimization.

The $max$ statement means that we are looking for a maximum value of the expression that follows. What is below (namely, the $Delta_{ij}$ values) the $max$ is the list of "free" parameters that change the value of the expression.

The $max$ statement is prefixed by $arg$, which means that we do not have interest in the expression's maximum value, but rather in the $Delta_{ij}$ set that leads to that value.

Then we face a $s.t.$ statement, because this is no ordinary optimization, we also have to respect the several constraints expressed after $s.t.$. Those can be inequations, equations, belonging constraints, etc., either explicit ($Delta_{ij} > 0$) or more implicit.

ML, Statistics and Mathematics

2 Answers

Add your own answers!

Ask a Question