保存并使用PCA中的特征向量

问题描述 投票:0回答:1

我在Stata进行了主成分分析(PCA)。

我的数据集包括8个不同国家的8个财务指标。

例如:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 Country double(Investment Profit Income Tax Repayment Leverage Interest Liquidity) int Year
"France"    -.1916055239385184  .046331346724579184  .16438012750896466    .073106839282063 30.373216652548326  4.116650784492168  3.222219873614461  .01453109309122077 2010
"UK"       -.09287803170279468   .10772082765154019  .19475363707485557  .05803923583546618 31.746409646181174  9.669982727208433 1.2958094802269167 .014273374324088752 2010
"US"       -.06262935107629553   .08674901201182428   .1241593221865416  .13387194413811226 25.336612638526013  11.14330064161111  1.954785887176916 .008355601163285917 2010
"Italy"   -.038025847122363045    .1523162032749684  .23885658237030563   .2057478638900476  31.02007902336988 2.9660938817562292   6.12544787693943 .011694993164234125 2010
"Germany"  -.05454795914578491   .06287079763890834  .09347194572148769  .08730237262847926 35.614342337621174  12.03770488195981 1.1958205191308358 .012467084153714813 2010
"Spain "   -.09133982259799572    .1520056836126315  .20905656056324853  .21054797530580743 30.133833346916546 2.0623245902645073  5.122615899157435 .013545432336873187 2010
"Sweden"   -.05403262462960799   .20463787181576967  .22924827352771968  .05655833155565016  20.30540887860061 10.392313613725324  .8634381995636089 .008030624504967313 2010
"Norway "  -.07560184571862992   .08383822093909514  .15469418498932822  .06569716455818478 29.568228705840234 14.383460621594622 1.5561013535825234 .012843159364225464 2010
"Algeria"   -.0494187835163535  .056252436429004446  .09174672864585759  .08143181185307143  34.74103858167055 15.045254276254616 1.2074942921860699 .011578038401820303 2010
"France"   -.03831442432584342   .14722819896988698  .22035417794604084  .12183886462162773  28.44763045286005 12.727100288710087  1.405629911115614 .011186908059399987 2011
"UK"       -.05002189329928202   .16833493262244398   .2288402623558823  .04977050186975224 27.640103129372747  11.17376089844228 1.1764542835994092 .008386726178729322 2011
"US"        -.0871005985124144   .10270482619857023   .1523559355903486  .06775742210623094 26.840586700880362 10.783899184031576  1.454011947763254 .013501919089967212 2011
"Italy"     -.1069324103590126   -.5877872620957578 -.47469302172710803   .2004436360021364 23.133243742952658 5.3936761686065875  4.532771849692548 .012586313916956204 2011
"Germany"  -.05851794344524515   .09960345907923154    .136805115392161   .1373407846168154   32.6182637042919 14.109738344526052 1.5077699357228835 .013200993625042274 2011
"Spain "   -.10650743527105216 -.015785638597076792   .1808727613216441  .05038848927405154  28.22206251292902 10.839614113486853 1.5021425852392374 .012076771099482617 2011
"Sweden"   -.09678946710644694   .11801761803893955  .18569993056826523   .1481844716617448 27.439283362903794  5.771154420635893  5.493437819181101 .013820243145673811 2011
"Norway "  -.04263379351591438   .09931719473864983  .14469611775596314   .0796835513869996  26.68561168581991  14.06385602832082 1.5200488174887825  .01029136242440406 2011
"Algeria"  -.04871983526465598    .2139061303228528   .2728647845448156 .056537570099712456  22.50263575072073 16.919641035094685  .7539881754626142 .009734650338902404 2011
end

轮换后,我将我的第一个组件“负债”和我的第二个组件称为“盈利能力”。

我有2011年,2012年,2013年,2014年等相同的数据。我想使用2010年计算的权重矩阵Stata,并分别应用于2011年,2012年,2013年。我的目标是比较各国之间的债务和盈利能力。

为此,我使用estimate saveestimates use命令(Stata手册关于估计的第20章和估计后的PCA命令帮助)。

但是,我无法理解Stata正在拯救什么。是保存2010年计算的分数还是特征值和特征向量?

这是我使用的代码:

tempfile pca
save `pca'
use `pca' if Year==2010
global xlist Investment Profit Income Tax Repayment Leverage Interest Liquidity
pca $xlist, components(2)
estimates save pcaest, replace
predict score
summarize score
use `pca' if Year==2011, clear
estimates use pcaest
predict score
summarize score
  1. 这个方法和代码对你来说是否正确?
  2. 我还想保存权重矩阵并创建一个新的矢量Z=b|1,1]*investment+...
command stata pca
1个回答
1
投票

使用2010年的玩具示例:

clear

input str7 Country double(Investment Profit Income Tax Repayment Leverage Interest Liquidity) int Year
"France"    -.1916055239385184  .046331346724579184  .16438012750896466    .073106839282063 30.373216652548326  4.116650784492168  3.222219873614461  .01453109309122077 2010
"UK"       -.09287803170279468   .10772082765154019  .19475363707485557  .05803923583546618 31.746409646181174  9.669982727208433 1.2958094802269167 .014273374324088752 2010
"US"       -.06262935107629553   .08674901201182428   .1241593221865416  .13387194413811226 25.336612638526013  11.14330064161111  1.954785887176916 .008355601163285917 2010
"Italy"   -.038025847122363045    .1523162032749684  .23885658237030563   .2057478638900476  31.02007902336988 2.9660938817562292   6.12544787693943 .011694993164234125 2010
"Germany"  -.05454795914578491   .06287079763890834  .09347194572148769  .08730237262847926 35.614342337621174  12.03770488195981 1.1958205191308358 .012467084153714813 2010
"Spain "   -.09133982259799572    .1520056836126315  .20905656056324853  .21054797530580743 30.133833346916546 2.0623245902645073  5.122615899157435 .013545432336873187 2010
"Sweden"   -.05403262462960799   .20463787181576967  .22924827352771968  .05655833155565016  20.30540887860061 10.392313613725324  .8634381995636089 .008030624504967313 2010
"Norway "  -.07560184571862992   .08383822093909514  .15469418498932822  .06569716455818478 29.568228705840234 14.383460621594622 1.5561013535825234 .012843159364225464 2010
"Algeria"   -.0494187835163535  .056252436429004446  .09174672864585759  .08143181185307143  34.74103858167055 15.045254276254616 1.2074942921860699 .011578038401820303 2010
end

我得到以下结果:

local xlist Investment Profit Income Tax Repayment Leverage Interest Liquidity
pca `xlist', components(2)

Principal components/correlation                 Number of obs    =          9
                                                 Number of comp.  =          2
                                                 Trace            =          8
    Rotation: (unrotated = principal)            Rho              =     0.7468

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      3.43566      .896796             0.4295       0.4295
           Comp2 |      2.53887      1.23215             0.3174       0.7468
           Comp3 |      1.30672      .750756             0.1633       0.9102
           Comp4 |      .555959      .472866             0.0695       0.9797
           Comp5 |     .0830926     .0181769             0.0104       0.9900
           Comp6 |     .0649157     .0526462             0.0081       0.9982
           Comp7 |     .0122695    .00975098             0.0015       0.9997
           Comp8 |    .00251849            .             0.0003       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    ------------------------------------------------
        Variable |    Comp1     Comp2 | Unexplained 
    -------------+--------------------+-------------
      Investment |   0.0004   -0.3837 |       .6262 
          Profit |   0.3896   -0.3794 |       .1131 
          Income |   0.4621   -0.1162 |        .232 
             Tax |   0.4146    0.1236 |       .3706 
       Repayment |  -0.1829    0.4747 |       .3131 
        Leverage |  -0.4685   -0.2596 |      .07464 
        Interest |   0.4580    0.2625 |       .1045 
       Liquidity |  -0.0082    0.5643 |       .1913 
    ------------------------------------------------

要查看pca命令返回的项目类型:

 ereturn list

scalars:
                  e(N) =  9
                  e(f) =  2
                e(rho) =  .7468162625387222
              e(trace) =  8
              e(lndet) =  -13.76082122673546
               e(cond) =  36.93476257313668

macros:
            e(cmdline) : "pca Investment Profit Income Tax Repayment Leverage Interest Liquidity, components(2)"
                e(cmd) : "pca"
              e(title) : "Principal components"
       e(marginsnotok) : "_ALL"
          e(estat_cmd) : "pca_estat"
         e(rotate_cmd) : "pca_rotate"
            e(predict) : "pca_p"
              e(Ctype) : "correlation"
         e(properties) : "nob noV eigen"

matrices:
                e(sds) :  1 x 8
              e(means) :  1 x 8
                  e(C) :  8 x 8
                e(Psi) :  1 x 8
                 e(Ev) :  1 x 8
                  e(L) :  8 x 2

functions:
             e(sample)   

将包含特征向量的返回矩阵保存为下一年的变量的一种方法是创建矩阵的副本并加载2011数据:

matrix A = e(L)

clear

input str7 Country double(Investment Profit Income Tax Repayment Leverage Interest Liquidity) int Year
"France"   -.03831442432584342   .14722819896988698  .22035417794604084  .12183886462162773  28.44763045286005 12.727100288710087  1.405629911115614 .011186908059399987 2011
"UK"       -.05002189329928202   .16833493262244398   .2288402623558823  .04977050186975224 27.640103129372747  11.17376089844228 1.1764542835994092 .008386726178729322 2011
"US"        -.0871005985124144   .10270482619857023   .1523559355903486  .06775742210623094 26.840586700880362 10.783899184031576  1.454011947763254 .013501919089967212 2011
"Italy"     -.1069324103590126   -.5877872620957578 -.47469302172710803   .2004436360021364 23.133243742952658 5.3936761686065875  4.532771849692548 .012586313916956204 2011
"Germany"  -.05851794344524515   .09960345907923154    .136805115392161   .1373407846168154   32.6182637042919 14.109738344526052 1.5077699357228835 .013200993625042274 2011
"Spain "   -.10650743527105216 -.015785638597076792   .1808727613216441  .05038848927405154  28.22206251292902 10.839614113486853 1.5021425852392374 .012076771099482617 2011
"Sweden"   -.09678946710644694   .11801761803893955  .18569993056826523   .1481844716617448 27.439283362903794  5.771154420635893  5.493437819181101 .013820243145673811 2011
"Norway "  -.04263379351591438   .09931719473864983  .14469611775596314   .0796835513869996  26.68561168581991  14.06385602832082 1.5200488174887825  .01029136242440406 2011
"Algeria"  -.04871983526465598    .2139061303228528   .2728647845448156 .056537570099712456  22.50263575072073 16.919641035094685  .7539881754626142 .009734650338902404 2011
end

然后你可以简单地使用svmat命令:

svmat A

list A* if _n < 9

     +-----------------------+
     |        A1          A2 |
     |-----------------------|
  1. |  .0003921    -.383703 |
  2. |  .3895898   -.3793983 |
  3. |  .4621098   -.1162487 |
  4. |  .4146066    .1235683 |
  5. | -.1828703    .4746658 |
     |-----------------------|
  6. | -.4685374   -.2596268 |
  7. |   .457974    .2624738 |
  8. | -.0081538    .5643047 |
     +-----------------------+

编辑:

根据评论修改:

use X1, clear

local xlist Investment Profit Income Tax Repayment Leverage Interest Liquidity

forvalues i = 1 / 5 {
    pca `xlist' if year == 201`i', components(2)
    matrix A201`i' = e(L)
    svmat A201`i'

    generate B201`i'1 = (A201`i'1 * Investment) + (A201`i'1 * Profit) + ///
                        (A201`i'1 * Income) + (A201`i'1 * Tax) + ///
                        (A201`i'1 * Repayment) + (A201`i'1 * Leverage) + ///
                        (A201`i'1 * Interest) + (A201`i'1 * Liquidity)

    generate B201`i'2 = (A201`i'2 * Investment) + (A201`i'2 * Profit) + ///
                        (A201`i'2 * Income) + (A201`i'2 * Tax) + ///
                        (A201`i'2 * Repayment) + (A201`i'2 * Leverage) + ///
                        (A201`i'2 * Interest) + (A201`i'2 * Liquidity)
}
© www.soinside.com 2019 - 2024. All rights reserved.