# ML, Statistics and Mathematics

Data Science Asked by ranit.b on December 20, 2020

I have just started getting my hands wet in ML and every time I try delving deeper into the concepts/code, I face the challenges of the mathematics and its cryptic notations.
Coming from a Computer Science background, I do understand bit of them but majority goes tangent.

I try and really want to understand them but somehow get confused and leave it everytime.

I would recommend a TOP-DOWN learning path:

1. get a first grasp about what algorithms types there are based on possible use cases (classification, regression, clustering, etc); this way, you know the WHAT CAN I SOLVE WITH THIS
2. for the algorithms you are interested in (a basic one could be a linear regression trained via gradient descent optimizer), you can get a first feeling using libraries like scikit-learn which wrap all the math in between, but give you results which you can quickly check and play with --> HOW CAN I SOLVE IT
3. after you have played around it, you can have a deeper look at how the algorithms work, with the linear algebra, statistics and calculus concepts you need to really understand them (basically, the math fomulae you said) --> HOW IT WORKS

Good sources:

• Python Machine Learning book, by Sebastian Raschka (good balance between theory and practice)
• Jason Brownlee blog and books (very applied use cases)
• scikit-learn documentation, which includes the math used in their code

Answered by German C M on December 20, 2020

It is quite true that papers or books use notations that sometimes seem obvious to people who are used to dealing with the mathematical aspects, but are meaningless for the others. Ways of understanding the math include:

• Following theoretical courses or trainings
• Asking people on forums such as this one, or Cross Validated for stats formulae
• Getting it by yourself upon re-reading parts of the paper/book you didn't get at the first time

There are some notations/conventions that are implicitly accepted in data science / machine learning papers, such as:

• Using $$X$$ as input, $$y$$ as output, $$theta$$ as model parameters
• Using $$hat{y}$$ for the estimator of the true $$y$$
• Assuming that vectors are column vectors

The list would be too long to include here.

Regarding the example above, what we face is a constrained optimization.

The $$max$$ statement means that we are looking for a maximum value of the expression that follows. What is below (namely, the $$Delta_{ij}$$ values) the $$max$$ is the list of "free" parameters that change the value of the expression.

The $$max$$ statement is prefixed by $$arg$$, which means that we do not have interest in the expression's maximum value, but rather in the $$Delta_{ij}$$ set that leads to that value.

Then we face a $$s.t.$$ statement, because this is no ordinary optimization, we also have to respect the several constraints expressed after $$s.t.$$. Those can be inequations, equations, belonging constraints, etc., either explicit ($$Delta_{ij} > 0$$) or more implicit.

Answered by Romain Reboulleau on December 20, 2020