What does it exactly mean when we say that PCA and LDA are linear methods of learning data representation?

Question

I have been reading on representation learning and I have come across this idea that PCA and LDA are linear methods of data representation, however, auto-encoders provide a non-linear way. Does this mean that the embedding learned by PCA can be transformed only linearly to reproduce data points ?

Ruchit Vithani · Accepted Answer

LDA is linear classifier since the classification boundary in LDA has the following form :

As you can see, the above equation is linear in $x$. In above equation, ${Pi_i}$ is the prior probability of class $i$, which is estimated from the training data and ${mu_i}$ is the mean of class $i$ also estimated from the training data. ${Sigma}$ is the common covariance matrix of all classes (it is assumed to be same in all classes in LDA, and that's why we get linear boundary, if you don't assume the same covariance matrix, the classification boundary no longer remains the linear in $x$).
PCA on the other hand is not a regression/classification algorithm. It is rather a feature extraction/ dimensionality reduction method, which helps you to represent your data in lower dimensions. It generally extracts the most significant $k$ features of your data. The value of $k$ is decided by you based on how many features you want to keep in your data. Or in what dimension you want to represent your data. PCA is the best representation of your data in to a lower dimensional space. This is linear transformation because of following equation :
$$ y space = space  W^Tx $$
As you can see, this equation is also linear in x. In this equation, the matrix W is a matrix obtained from the covariance matrix of your data. The first row of matrix $W$ corresponds to the eigen vector of covariance matrix of $x$ corresponding to highest eigen value. This is because that eigen vector gives the maximum variation in your data. (This is proved in the optimization of PCA). The second row corresponds to second max eigen value because it represents direction with second highest variation in your data. You can refer to the notes of Prof. Ali Ghodsi or Prof. Andrew NG for the proof of why we select the eigen vectors in this order in PCA.
Now coming back to your question about reproducing the data only linearly. Yes, you can only reproduce back your data by linear transformation in PCA. Why do you want want to reproduce your data non linearly when your transformation was linear? Even if you want to reconstruct your data non linearly from transformed data $y$, do you have any enough information to do so? You just have a matrix $W$ using which you linearly transformed your data $x$ into a lower dimensional space. A non linear reconstruction might look like this :
$$ x space = space y^TW_1^Ty space+space W^{-T}y $$
This is a non linear reconstruction. But do know what is $W_1$ here? Even if you try to estimate $W_1$, you'll endup getting it as a zero matrix since $y = W^Tx$ and so $W^{-T}y = x$. So, any other factor in the estimation will give you nothing but the zeros as a coeffocients of the other non linear factors (factors like $y^Ty$).

What does it exactly mean when we say that PCA and LDA are linear methods of learning data representation?

One Answer

Add your own answers!

Ask a Question