如何通过方差对主成分进行加权?

问题描述 投票:0回答:1

我正在关注论文新的 ECOSTRESS 和 MODIS 陆地表面温度数据揭示城市的细小规模热脆弱性:加利福尼亚州洛杉矶县的案例研究,我引用:

根据Kaiser规则,只有那些特征值大于1的PC才会被保留进行分析。拥有的电脑 然后使用最大方差旋转大于 1 的特征值 旋转以改善其解释并最大限度地分散 跨 PC 的负载。 这些轮换的 PC 分数的权重为 方差,然后用于重建原始观察结果。

我使用

FactoMineR
psych
库对 4 个变量运行 PCA。在 PCA 之前,所有变量均已归一化,平均值为零,标准差为 1(z 分数)。

prc <-  PCA(df_adapt_pca, graph = FALSE, scale.unit = FALSE)
eig.val.adpt <- get_eigenvalue(adpt.pca)
eig.val.adpt

结果如下:

eigenvalue variance.percent cumulative.variance.percent
Dim.1  1.9746259        49.377345                    49.37735
Dim.2  1.1408281        28.527461                    77.90481
Dim.3  0.7619571        19.053441                    96.95825
Dim.4  0.1216413         3.041753                   100.00000

然后我选择了特征值 > 1 的 PC,并执行了

varimax
旋转,如下所示:

# Extract the eigenvalues for the first two principal components
eigenvalues <- eig.val.adpt[1:2, "eigenvalue"]

# Extract the loadings for the first two principal components
loadings_matrix <- adpt.pca$var$coord[, 1:2]

# Perform varimax rotation
varimax_result <- varimax(loadings_matrix)

# The rotated loadings
rotated_loadings <- varimax_result$loadings

最大方差旋转的结果:

Loadings:
      Dim.1  Dim.2 
income     0.959       
ndvi              0.817
cs_dist   -0.226  0.732
education  0.937 -0.240

    Dim.1 Dim.2
SS loadings    1.852 1.263
Proportion Var 0.463 0.316
Cumulative Var 0.463 0.779

这是我的问题:如何通过方差对旋转的 PC 进行加权?


我最初的尝试,基于this帖子是:

# Scale the rotated PCs by the square root of the eigenvalues
weighted_rotated_PCs <- rotated_loadings %*% diag(sqrt(eigenvalues))

# Reconstruct the original data using the scores and the weighted rotated PCs
scores <- adpt.pca$ind$coord[, 1:2]

# Use the weighted rotated PCs to reconstruct the data
Xhat <- scores %*% t(weighted_rotated_PCs)

# Adjust the reconstructed data to have the same mean as the original data
mu <- colMeans(mydt)
Xhat <- scale(Xhat, center = -mu, scale = FALSE)

# Convert Xhat to a data frame
Xhat <- as.data.table(Xhat)

# column names to match original data
colnames(Xhat) <- colnames(mydt) # mydt contains the original obs

但是当我比较

Xhat
mydt
的输出时,我得到:

Xhat[1,]
    income      ndvi   cs_dist education
1 2.067341 0.6560899 -1.683479  2.184048

mydt[1, ]
      income      ndvi    cs_dist education
1 0.4087451 0.1262939 -0.3533639 0.3261632

这是完全不同的。我的权重是不是错了?

完整代码:

library(data.table)
library(dplyr)
library(FactoMineR)
library(psych)
    
wd <- "path/"
    
mydt <- read.table(paste0(wd, "mydt.csv"), sep = ",", header = TRUE)
    
# glimpse(mydt)
    
# Z-score normalize
adpt.pca <- as.data.frame(scale(mydt), center = TRUE, scale = TRUE)
    
adpt.pca <- PCA(df_adapt_pca, graph = FALSE, scale.unit = FALSE)
eig.val.adpt <- get_eigenvalue(adpt.pca)
eig.val.adpt

# Extract the eigenvalues for the first two principal components
eigenvalues <- eig.val.adpt[1:2, "eigenvalue"]

# Extract the loadings for the first two principal components
loadings_matrix <- adpt.pca$var$coord[, 1:2]

# Perform varimax rotation
varimax_result <- varimax(loadings_matrix)

# The rotated loadings
rotated_loadings <- varimax_result$loadings

# Scale the rotated PCs by the square root of the eigenvalues
weighted_rotated_PCs <- rotated_loadings %*% diag(sqrt(eigenvalues))

# Reconstruct the original data using the scores and the weighted rotated PCs
scores <- adpt.pca$ind$coord[, 1:2]

# Use the weighted rotated PCs to reconstruct the data
Xhat <- scores %*% t(weighted_rotated_PCs)

# Adjust the reconstructed data to have the same mean as the original data
mu <- colMeans(mydt)
Xhat <- scale(Xhat, center = -mu, scale = FALSE)

# Convert Xhat to a data frame
Xhat <- as.data.table(Xhat)

# column names to match original data
colnames(Xhat) <- colnames(mydt) # my dt contains the original obs

下面是 20 行的原始数据集样本(Z 分数标准化之前):

dput(mydt)
structure(list(income = c(0.0001063, 0.000106, 6.72e-05, 7.97e-05, 
0.0001197, 4.09e-05, 5.17e-05, 0.0001092, 8.62e-05, 7.27e-05, 
0.0001034, 0.0001159, 7.24e-05, 9.17e-05, 8.06e-05, 0.0001049, 
8.15e-05, 9.05e-05, 0.0001063, 5.99e-05), ndvi = c(0.434779405593872, 
0.519024193286896, 0.484442293643951, 0.358367592096329, 0.613705396652222, 
0.508738815784454, 0.705485105514526, 0.454894632101059, 0.396738857030869, 
0.408085465431213, 0.425091296434402, 0.360570818185806, 0.455742985010147, 
0.44114676117897, 0.498669385910034, 0.404618799686432, 0.51068776845932, 
0.295410215854645, 0.606453955173492, 0.46584877371788), cs_dist = c(1515.64929199219, 
3037.51879882812, 2663.20043945312, 1761.39184570312, 344.697448730469, 
252.047805786133, 5528.3486328125, 2387.2802734375, 2771.0546875, 
877.851745605469, 1342.23034667969, 3318.9130859375, 1075.06188964844, 
5190.70166015625, 739.960021972656, 4005.1572265625, 684.494079589844, 
426.935241699219, 1222.70263671875, 2597.5166015625), education = c(0.0001015, 
9.71e-05, 6.14e-05, 8.47e-05, 9.97e-05, 5.29e-05, 4.74e-05, 0.0001464, 
0.0001042, 7.53e-05, 0.0001143, 9.4e-05, 6.57e-05, 5.52e-05, 
7.98e-05, 9.5e-05, 6.98e-05, 9.64e-05, 0.0001063, 6.43e-05)), row.names = c(NA, 
-20L), class = c("data.table", "data.frame"), na.action = structure(c(`2402` = 2174L, 
`2404` = 2176L), class = "omit"), .internal.selfref = <pointer: 0x0000023dc7801200>)

R 4.4.0、RStudio 2024.04.2 内部版本 764、Windows 11。

r pca psych factominer
1个回答
0
投票

根据评论中的链接,我设法解决了这个问题(如何通过方差对旋转的 PC 进行加权),如下所示:

# z-score normalization
mydt_normalized <- as.data.frame(scale(mydt))

# PCA
pca_result <- PCA(mydt_normalized, graph = FALSE)

# extract PCs with eigenvalues > 1, this step is to identify the desired # of PCs
eigenvalues <- pca_result$eig[, "eigenvalue"]
n_components <- sum(eigenvalues > 1)

# PCA with the selected number of components 
pca_selected <- PCA(mydt_normalized, ncp = n_components, graph = FALSE)

# varimax rotation
rotated_loadings <- varimax(pca_selected$var$coord)

# calculation of rotated PC scores
rotated_scores <- as.matrix(mydt_normalized) %*% rotated_loadings$loadings

# weight the rotated PC scores by variance
variance_weights <- eigenvalues[1:n_components] / sum(eigenvalues[1:n_components])
weighted_scores <- sweep(rotated_scores, 2, variance_weights, "*")

# reconstruction the original observations
reconstructed_data <- weighted_scores %*% t(rotated_loadings$loadings)

# Calculate the mean and standard deviation of the original data
original_means <- colMeans(mydt)
original_sds <- apply(mydt, 2, sd)

# Convert the reconstructed data back to original scale
reconstructed_original <- sweep(reconstructed_data, 2, original_sds, "*")
reconstructed_original <- sweep(reconstructed_original, 2, original_means, "+")

# option to display full numbers
options(scipen = 999)

# comparison of the original vs reconstructed obs
# the original observations
mydt[1, ]
      income      ndvi  cs_dist education
       <num>     <num>    <num>     <num>
1: 0.0001063 0.4347794 1515.649 0.0001015

# the reconstructed observations
income            ndvi          cs_dist          education
  <num>           <num>         <num>            <num>
1: 0.0001071987   0.4290230290  1864.3736031922  0.0001081663

我还不确定这种方法是否正确,所以如果有人有任何见解,很高兴发表评论。

© www.soinside.com 2019 - 2024. All rights reserved.