Data Science Asked by joann2555 on January 12, 2021
I came to data science/machine learning from another background in computer science and i feel that i’m lacking of experience with matricial/vectorial operations.
Python or Matlab, for instance, provide awesome features like numpy
to easily manipulate tabular data. However, the first solution that comes to my mind when coding is for
or while
loops. I’ve faced situations where my code could be easily reduced to 1 line or so with numpy
.
I’d like to know if there are good lectures/books you can recommend to change this way of thinking when it comes to problem solving/coding approaches.
Thanks in advance.
tl;dr: refresh your linear algebra and check for built-in functions.
In addition to brevity/elegance, vectorization often dramatically improves performance. Tools like MATLAB and numpy use high-level vectorized wrappers for low-level optimized implementations, which in the end are composed of your traditional loop operations. Where MATLAB and Python are interpreted (nuances exist, but are irrelevant here), the low-level subroutines that they wrap are obviously much faster. Also, tools that they wrap, like BLAS (Basic Linear Algebra Subprograms), are so heavily optimized that it'd be difficult for someone write a low-level subroutine that matches or exceeds the performance achieved with such libraries; many people over many decades worked to optimize them!
Considering problems from a math-oriented perspective helps with vectorization. For example, say you wanted to find the $L_p$ norm of a vector $vec{x}$. A traditional CS program would look something like:
norm = 0
for i in vec_x:
norm = norm + pow(i,p)
norm = pow(norm,1/p)
but a mathematician would simply write $||vec{x}||_p$. First, you should check to see if a built-in function exists to perform your task, as a mathematician might. Then you should look to vectorize using math operators such as matrix multiplication, element-wise operations, etc., which these high-level tools implement. Going back to the $L_p$ example, you could write vectorized code (in MATLAB): (sum(x.^p))^(1/p)
, but it'd be even better to write norm(x,p)
. Knowing that such functions exist is a combination of familiarity with the tool and Googling.
Answered by Benji Albert on January 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP