请考虑以下示例数据:
psu | sumsc sumst sumobc sumother sumcaste
-------|-----------------------------------------------
10018 | 3 2 0 4 9
|
10061 | 0 0 2 5 7
|
10116 | 1 1 2 4 8
|
10121 | 3 0 1 2 6
|
20002 | 4 1 0 1 6
-------------------------------------------------------
我想根据sumsc
中对sumst
(这是所有变量的总和)的贡献百分比对变量sumobc
,sumother
,sumcaste
和psu
进行排名。
任何人都可以帮我在Stata做这个吗?
首先我们输入数据:
clear all
set more off
input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end
其次,我们准备reshape
:
local j=1
foreach var of varlist sumsc sumst sumobc sumother {
gen temprl`j' = `var' / sumcaste
ren `var' addi`j'
local ++j
}
reshape long temprl addi, i(psu) j(ord)
lab def ord 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
lab val ord ord
第三,我们在提交之前订购:
gsort psu -temprl
by psu: gen nro=_n
drop temprl
order psu nro ord
四,提交数据:
br psu nro ord addi
编辑:
这是Aron与我的解决方案(@PearlySpencer)的组合:
clear
input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end
local i = 0
foreach var of varlist sumsc sumst sumobc sumother {
local ++i
generate pct`i' = 100 * `var' / sumcaste
rename `var' temp`i'
local rvars "`rvars' r`i'"
}
rowranks pct*, generate("`rvars'") field lowrank
reshape long pct temp r, i(psu) j(name)
label define name 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
label values name name
keep psu name pct r
bysort psu (r): replace r = sum(r != r[_n-1])
这为您提供了所需的输出:
list, sepby(psu) noobs
+---------------------------------+
| psu name pct r |
|---------------------------------|
| 10018 sumother 44.44444 1 |
| 10018 sumsc 33.33333 2 |
| 10018 sumst 22.22222 3 |
| 10018 sumobc 0 4 |
|---------------------------------|
| 10061 sumother 71.42857 1 |
| 10061 sumobc 28.57143 2 |
| 10061 sumsc 0 3 |
| 10061 sumst 0 3 |
|---------------------------------|
| 10116 sumother 50 1 |
| 10116 sumobc 25 2 |
| 10116 sumst 12.5 3 |
| 10116 sumsc 12.5 3 |
|---------------------------------|
| 10121 sumsc 50 1 |
| 10121 sumother 33.33333 2 |
| 10121 sumobc 16.66667 3 |
| 10121 sumst 0 4 |
|---------------------------------|
| 20002 sumsc 66.66666 1 |
| 20002 sumst 16.66667 2 |
| 20002 sumother 16.66667 2 |
| 20002 sumobc 0 3 |
+---------------------------------+
如果您需要变量进行进一步分析而不是仅显示结果,则此方法将非常有用。
首先,您需要计算百分比:
clear
input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end
foreach var of varlist sumsc sumst sumobc sumother {
generate pct_`var' = 100 * `var' / sumcaste
}
egen pcttotal = rowtotal(pct_*)
list pct_* pcttotal, abbreviate(15) noobs
+--------------------------------------------------------------+
| pct_sumsc pct_sumst pct_sumobc pct_sumother pcttotal |
|--------------------------------------------------------------|
| 33.33333 22.22222 0 44.44444 100 |
| 0 0 28.57143 71.42857 100 |
| 12.5 12.5 25 50 100 |
| 50 0 16.66667 33.33333 100 |
| 66.66666 16.66667 0 16.66667 99.99999 |
+--------------------------------------------------------------+
然后你需要得到排名并做一些体操:
rowranks pct_*, generate(r_sumsc r_sumst r_sumobc r_sumother) field lowrank
mkmat r_*, matrix(A)
matrix A = A'
svmat A, names(row)
local matnames : rownames A
quietly generate name = " "
forvalues i = 1 / `: word count `matnames'' {
quietly replace name = substr(`"`: word `i' of `matnames''"', 3, .) in `i'
}
ds row*
foreach var in `r(varlist)' {
sort `var' name
generate `var'b = sum(`var' != `var'[_n-1])
drop `var'
rename `var'b `var'
list name `var' if name != " ", noobs
display ""
}
以上将给你你想要的:
+-----------------+
| name row1 |
|-----------------|
| sumother 1 |
| sumsc 2 |
| sumst 3 |
| sumobc 4 |
+-----------------+
+-----------------+
| name row2 |
|-----------------|
| sumother 1 |
| sumobc 2 |
| sumsc 3 |
| sumst 3 |
+-----------------+
+-----------------+
| name row3 |
|-----------------|
| sumother 1 |
| sumobc 2 |
| sumsc 3 |
| sumst 3 |
+-----------------+
+-----------------+
| name row4 |
|-----------------|
| sumsc 1 |
| sumother 2 |
| sumobc 3 |
| sumst 4 |
+-----------------+
+-----------------+
| name row5 |
|-----------------|
| sumsc 1 |
| sumother 2 |
| sumst 2 |
| sumobc 3 |
+-----------------+
请注意,在执行上述代码之前,首先需要安装社区提供的命令rowranks
:
net install pr0046.pkg