Data Science Asked by Zðkaria Bóüsserghine on September 26, 2020
I need help with the principal components’ analysis code below in 3 ways:
Write Code to create RIM (RAKE) weighting.
I’m trying to ensure that using the code below, I get exactly the same results than in SPSS.
If you change the weights below to 1, the two codes needs to yield the same results (weighting with 1 is the same than no weighting). However the two codes below do not give the same results if you change the weighing to 1 !
Here are the python codes I wrote:
Without weighting
import numpy as np
import pandas as pd
import math
import statistics
from sklearn.decomposition
from sklearn.preprocessing import StandardScaler
d = {
'gender':[0,0,1,1,1,1,1,1],
'v1':[17,18,11,16,11,18,15,19],
'v2':[2.54,2.71,2.07,2.50,1.96,2.17,1.63,2.36],
'v3':[2.02,2.47,2.12,2.53,1.50,2.39,2.45,2.02],
'v4':[2.34,2.08,1.49,2.05,1.70,2.42,1.68,2.21]
}
df = pd.DataFrame(data=d)
x = StandardScaler().fit_transform(df.loc[:, ['v1','v2','v3','v4']])
cor_mat1 = np.corrcoef(x.T)
eig_vals, eig_vecs = np.linalg.eig(cor_mat1)
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]
eig_pairs.sort(key=lambda x: x[0], reverse=True)
FAC1_1 = x.dot(eig_pairs[0][1].reshape(4,1))
print(FAC1_1)
**With weighting**
import numpy as np
import pandas as pd
import math
import statistics
from sklearn.preprocessing import StandardScaler
def m(x, w):
return np.sum(x * w) / np.sum(w)
def cov(x,y, w):
return np.sum(w * (x - m(x, w)) * (y - m(y, w))) / np.sum(w)
def corr(df, w):
dt = pd.DataFrame(np.nan, index=df.columns, columns=df.columns)
for i in range(df.shape[1]-1):
dt.iloc[i , i]=1
for j in range(1,df.shape[1]):
x=cov(df.iloc[: , i], df.iloc[: , j], w) / np.sqrt(cov(df.iloc[: , i],df.iloc[: , i], w) * cov(df.iloc[: , j],df.iloc[: , j], w))
dt.iloc[i , j]=x
dt.iloc[j , i]=x
dt.iloc[df.shape[1]-1 , df.shape[1]-1]=1
return dt
d = {
'gender':[0,0,1,1,1,1,1,1],
'v1':[17,18,11,16,11,18,15,19],
'v2':[2.54,2.71,2.07,2.50,1.96,2.17,1.63,2.36],
'v3':[2.02,2.47,2.12,2.53,1.50,2.39,2.45,2.02],
'v4':[2.34,2.08,1.49,2.05,1.70,2.42,1.68,2.21],
'weight':[2,2,0.666666667,0.666666667,0.666666667,0.666666667,0.666666667,0.666666667]
}
df = pd.DataFrame(data=d)
x = pd.DataFrame(df.loc[:, ['v1','v2','v3','v4']])
cor_mat1 = corr(x,df['weight'])
eig_vals, eig_vecs = np.linalg.eig(cor_mat1)
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]
eig_pairs.sort(key=lambda x: x[0], reverse=True)
FAC1_1 = x.dot(eig_pairs[0][1].reshape(4,1))
print(FAC1_1)
Here is the SPSS syntax:
FACTOR
/VARIABLES v1 v2 v3 v4
/MISSING LISTWISE
/ANALYSIS v1 v2 v3 v4
/PRINT INITIAL EXTRACTION
/CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION PC
/ROTATION NOROTATE
/SAVE REG(ALL)
/METHOD=CORRELATION.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP