TransWikia.com

Python sklearn PCA transform function output does not match

Data Science Asked by shaifali Gupta on December 8, 2020

I am computing PCA on some data using 10 components and using 3 out of 10 as:

transformer = PCA(n_components=10)
trained=transformer.fit(train)
one=numpy.matmul(train,numpy.transpose(trained.components_[:3,:]))

Here trained.components_[:3,:] are:

array([[-1.43311999e-03,  1.65635865e-01,  5.49189565e-01,
         5.26069645e-02,  2.42638594e-01,  1.20957807e-02,
         1.30595572e-01,  1.09279646e-02,  7.21299808e-03,
        -2.79057934e-02, -1.14834589e-02,  5.06289160e-01,
         5.42890317e-01,  8.50422194e-02,  1.80935205e-01,
         2.98473275e-05, -8.04537378e-04],
       [-1.05419313e-02,  3.09442577e-01, -8.15534934e-02,
         4.28621520e-03,  2.93323569e-01,  3.85849115e-02,
        -1.16193185e-01,  4.14964652e-01,  4.16279154e-01,
         2.95264788e-01,  3.28620106e-01, -2.60916490e-01,
        -2.37459426e-02,  1.57567265e-01,  4.02873342e-01,
         5.28389303e-05, -2.07920000e-03],
       [ 8.63072772e-03, -3.26129082e-01,  8.59869400e-02,
         3.04770780e-03, -3.14966419e-01, -2.47151330e-02,
         1.05987767e-01,  3.74235953e-01,  3.75747065e-01,
         2.76035253e-01,  3.18273743e-01,  3.02423861e-01,
         2.76535177e-02, -1.51485057e-01, -4.48558170e-01,
        -8.83328996e-05, -2.25542180e-03]])

and using only 3 components as :

transformer = PCA(n_components=3)
trained=transformer.fit(train)
two=trained.transform(train)

Here the components are:

          array([[-1.43311999e-03,  1.65635865e-01,  5.49189565e-01,
         5.26069645e-02,  2.42638594e-01,  1.20957807e-02,
         1.30595572e-01,  1.09279646e-02,  7.21299808e-03,
        -2.79057934e-02, -1.14834589e-02,  5.06289160e-01,
         5.42890317e-01,  8.50422194e-02,  1.80935205e-01,
         2.98473275e-05, -8.04537377e-04],
       [-1.05419314e-02,  3.09442577e-01, -8.15534934e-02,
         4.28621520e-03,  2.93323569e-01,  3.85849115e-02,
        -1.16193185e-01,  4.14964652e-01,  4.16279154e-01,
         2.95264788e-01,  3.28620106e-01, -2.60916490e-01,
        -2.37459426e-02,  1.57567265e-01,  4.02873342e-01,
         5.28389307e-05, -2.07919994e-03],
       [ 8.63072765e-03, -3.26129082e-01,  8.59869400e-02,
         3.04770780e-03, -3.14966419e-01, -2.47151331e-02,
         1.05987767e-01,  3.74235953e-01,  3.75747065e-01,
         2.76035253e-01,  3.18273743e-01,  3.02423861e-01,
         2.76535177e-02, -1.51485057e-01, -4.48558170e-01,
        -8.83328994e-05, -2.25542175e-03]])

But one comes not equal to two. Components are same in both. They are not same because transform function first subtracts the original data by mean vector and then multiplies with components. But why should the mean be subtracted here. As they are subtracted in the first step to compute PCA for computing basis.

One Answer

If you look at the source code, the PCA is calculated through the SVD. I believe it iterates until "good enough."

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py

Answered by Carl Rynegardh on December 8, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP