我有兴趣估计泊松固定效应模型:
其中 是 观测值的“年龄”。
我对 系数感兴趣,而不是其他固定效应。
我的第一次估计尝试如下:
library(readr)
Data <- read_csv("FullData.csv", col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor(), AGE = col_factor()))
library(fixest)
Results = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, Data, nthreads=28, verbose=1000)
但这会导致
fepois
尝试从 AGE
变量创建完整的虚拟矩阵,该变量太大而无法装入内存。 (大约有 1.5 亿个观测值,AGE
上升到大约 400 个。)
作为替代方案,我尝试过:
Results = fepois(MOVE ~ 1 | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK + AGE, Data, nthreads=28, verbose=1000)
FE = fixef(Results)
使用这种方法,
fepois
调用成功完成,但随后在fixef
调用中失败(以获得固定效果,现在存储在其中),并显示消息:
Problem getting FE, maximum iterations reached (1st order loop).NOTE: The fixed-effects are not regular, they cannot be straightforwardly interpreted. The number of references is only approximate.
当然,我可以增加迭代次数,但事实上我收到此消息表明可能有更好的方法我不知道。 (“规律性”也是这种方法的一个问题。估计是否从 和 固定效应中删除某些列并不重要,但我不希望它从 固定效应中删除任何列。)
我应该如何接近这个估计?
顺便说一句:尽管设置了
nthreads
,fepois
仍然只使用一个线程。有什么想法吗? (调用 setFixest_nthreads(28)
似乎也没有什么区别。)
更新 1:在
iter=100000000
调用中设置 fixef
没有什么区别。我仍然遇到相同的错误,这表明所遇到的迭代计数不同。
更新 2:以下是数据集的前 10000 行:https://gist.github.com/tholden/7cf0b4b8ae2b6030b60b704766903612 (*)
更新3:
getFixest_nthreads()
返回28,正如预期的那样(这是我设置的,也是我机器上逻辑处理器数量的一半)。
如果我正确理解你的问题,你会得到这样的结果
library(fixest)
library(readr)
examp_dat1 = read_csv('https://gist.githubusercontent.com/tholden/7cf0b4b8ae2b6030b60b704766903612/raw/d3b7a3810936344906f90b7d62b506ff42af0dd1/SampleData.csv', col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor(), AGE = col_factor()))
mod = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, data = examp_dat1)
#> NOTE: 9/0 fixed-effects (394 observations) removed because of only 0 outcomes.
#> The variable 'AGE224' has been removed because of collinearity (see $collin.var).
mod
#> Poisson estimation, Dep. Var.: MOVE
#> Observations: 9,605
#> Fixed-effects: STORE_COM_CODE^UPC_PRICE: 315, STORE_COM_CODE^WEEK: 384
#> Standard-errors: Clustered (STORE_COM_CODE^UPC_PRICE)
#> Estimate Std. Error z value Pr(>|z|)
#> AGE3 -0.012467 11.6001 -0.001075 0.99914
#> AGE4 0.049981 23.2149 0.002153 0.99828
#> AGE5 -0.105345 34.8334 -0.003024 0.99759
#> AGE6 -0.161140 46.4345 -0.003470 0.99723
#> AGE7 -0.234467 58.0617 -0.004038 0.99678
#> AGE8 -0.172549 69.6805 -0.002476 0.99802
#> AGE9 -0.130779 81.2899 -0.001609 0.99872
#> AGE10 -0.112788 92.8970 -0.001214 0.99903
#> ... 324 coefficients remaining (display them with summary() or use argument n)
#> ... 1 variable was removed because of collinearity (AGE224)
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -12,241.4 Adj. Pseudo R2: 0.249849
#> BIC: 33,928.0 Squared Cor.: 0.551105
发生的情况是,您在导入数据时将年龄视为一个因素,因此 fepois 正在估计除参考之外的每个级别的系数。如果您对年龄的影响感兴趣,那么您需要做的就是将其强制为数字或在导入时省略
Age = col_factor()
examp_dat2 = read_csv('https://gist.githubusercontent.com/tholden/7cf0b4b8ae2b6030b60b704766903612/raw/d3b7a3810936344906f90b7d62b506ff42af0dd1/SampleData.csv', col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor()))
mod2 = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, data = examp_dat2)
#> NOTE: 9/0 fixed-effects (394 observations) removed because of only 0 outcomes.
mod2
#> Poisson estimation, Dep. Var.: MOVE
#> Observations: 9,605
#> Fixed-effects: STORE_COM_CODE^UPC_PRICE: 315, STORE_COM_CODE^WEEK: 384
#> Standard-errors: Clustered (STORE_COM_CODE^UPC_PRICE)
#> Estimate Std. Error z value Pr(>|z|)
#> AGE 1.3405 57551.2 2.3e-05 0.99998
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -12,567.5 Adj. Pseudo R2: 0.250126
#> BIC: 31,544.9 Squared Cor.: 0.504288
对于
setFixest_nthreads()
无论出于何种原因,如果您想在问题上抛出所有可用线程,那么您需要设置 setFixest_nthreads(nthreads = 0)
。