R 固定:感兴趣的主要变量是庞大数据集中的固定效应

问题描述 投票:0回答:1

我有兴趣估计泊松固定效应模型:

\log{\mathbb{E}[y_{i,j,t}]}=\beta_{A(i,j,t)}+\alpha_{i,t}+\gamma_{i,j}

其中 A(i,j,t)\in\mathbb{N}(i,j,t) 观测值的“年龄”。

我对 \beta_{\cdot} 系数感兴趣,而不是其他固定效应。

我的第一次估计尝试如下:

library(readr)
Data <- read_csv("FullData.csv", col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor(), AGE = col_factor()))
library(fixest)
Results = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, Data, nthreads=28, verbose=1000)

但这会导致

fepois
尝试从
AGE
变量创建完整的虚拟矩阵,该变量太大而无法装入内存。 (大约有 1.5 亿个观测值,
AGE
上升到大约 400 个。)

作为替代方案,我尝试过:

Results = fepois(MOVE ~ 1 | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK + AGE, Data, nthreads=28, verbose=1000)
FE = fixef(Results)

使用这种方法,

fepois
调用成功完成,但随后在
fixef
调用中失败(以获得固定效果,\beta_{\cdot}现在存储在其中),并显示消息:

Problem getting FE, maximum iterations reached (1st order loop).NOTE: The fixed-effects are not regular, they cannot be straightforwardly interpreted. The number of references is only approximate.

当然,我可以增加迭代次数,但事实上我收到此消息表明可能有更好的方法我不知道。 (“规律性”也是这种方法的一个问题。估计是否从 \alpha_{\cdot}\gamma_{\cdot} 固定效应中删除某些列并不重要,但我不希望它从 \beta_{\cdot} 固定效应中删除任何列。)

我应该如何接近这个估计?


顺便说一句:尽管设置了

nthreads
fepois
仍然只使用一个线程。有什么想法吗? (调用
setFixest_nthreads(28)
似乎也没有什么区别。)


更新 1:在

iter=100000000
调用中设置
fixef
没有什么区别。我仍然遇到相同的错误,这表明所遇到的迭代计数不同。

更新 2:以下是数据集的前 10000 行:https://gist.github.com/tholden/7cf0b4b8ae2b6030b60b704766903612 (*)

更新3:

getFixest_nthreads()
返回28,正如预期的那样(这是我设置的,也是我机器上逻辑处理器数量的一半)。

r regression poisson fixest rparallel
1个回答
0
投票

如果我正确理解你的问题,你会得到这样的结果

library(fixest)
library(readr)


examp_dat1 = read_csv('https://gist.githubusercontent.com/tholden/7cf0b4b8ae2b6030b60b704766903612/raw/d3b7a3810936344906f90b7d62b506ff42af0dd1/SampleData.csv', col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor(), AGE = col_factor())) 


mod = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, data = examp_dat1)
#> NOTE: 9/0 fixed-effects (394 observations) removed because of only 0 outcomes.
#> The variable 'AGE224' has been removed because of collinearity (see $collin.var).
  
  mod
#> Poisson estimation, Dep. Var.: MOVE
#> Observations: 9,605
#> Fixed-effects: STORE_COM_CODE^UPC_PRICE: 315,  STORE_COM_CODE^WEEK: 384
#> Standard-errors: Clustered (STORE_COM_CODE^UPC_PRICE) 
#>        Estimate Std. Error   z value Pr(>|z|) 
#> AGE3  -0.012467    11.6001 -0.001075  0.99914 
#> AGE4   0.049981    23.2149  0.002153  0.99828 
#> AGE5  -0.105345    34.8334 -0.003024  0.99759 
#> AGE6  -0.161140    46.4345 -0.003470  0.99723 
#> AGE7  -0.234467    58.0617 -0.004038  0.99678 
#> AGE8  -0.172549    69.6805 -0.002476  0.99802 
#> AGE9  -0.130779    81.2899 -0.001609  0.99872 
#> AGE10 -0.112788    92.8970 -0.001214  0.99903 
#> ... 324 coefficients remaining (display them with summary() or use argument n)
#> ... 1 variable was removed because of collinearity (AGE224)
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -12,241.4   Adj. Pseudo R2: 0.249849
#>            BIC:  33,928.0     Squared Cor.: 0.551105

发生的情况是,您在导入数据时将年龄视为一个因素,因此 fepois 正在估计除参考之外的每个级别的系数。如果您对年龄的影响感兴趣,那么您需要做的就是将其强制为数字或在导入时省略

Age = col_factor()


examp_dat2 = read_csv('https://gist.githubusercontent.com/tholden/7cf0b4b8ae2b6030b60b704766903612/raw/d3b7a3810936344906f90b7d62b506ff42af0dd1/SampleData.csv', col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor())) 



mod2 = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, data = examp_dat2)
#> NOTE: 9/0 fixed-effects (394 observations) removed because of only 0 outcomes.
  
mod2
#> Poisson estimation, Dep. Var.: MOVE
#> Observations: 9,605
#> Fixed-effects: STORE_COM_CODE^UPC_PRICE: 315,  STORE_COM_CODE^WEEK: 384
#> Standard-errors: Clustered (STORE_COM_CODE^UPC_PRICE) 
#>     Estimate Std. Error z value Pr(>|z|) 
#> AGE   1.3405    57551.2 2.3e-05  0.99998 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -12,567.5   Adj. Pseudo R2: 0.250126
#>            BIC:  31,544.9     Squared Cor.: 0.504288

对于

setFixest_nthreads()
无论出于何种原因,如果您想在问题上抛出所有可用线程,那么您需要设置
setFixest_nthreads(nthreads = 0)

© www.soinside.com 2019 - 2024. All rights reserved.