我有一个1-5编码的问卷,然后标记为(。)缺失变量。如何编码数据以反映以下内容:
如果患者=> 80%的值不丢失,则缺失值将被编码为所回答问题的平均值。如果患者丢失超过80%的值而不是设定的测量总结,则丢失记录。
condomuse;
set int108;
run;
proc means data=condomuse n nmiss missing;
var cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
by Intround sid;
run;
使用以下假设:
NMISS(),N(),CMISS()和DIM()是可以使用数组的函数。
这将识别缺失80%或更多的所有记录。
data temp; *temp is output data set name;
set have; *have is input data set name;
*create an array to avoid listing all variables later;
array vars_check(*) cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
*calculate percent missing;
Percent_Missing = NMISS(of vars_check(*)) / Dim(vars_check);
if percent_missing >= 0.8 then exclude = 'Y';
else exclude = 'N';
run;
要用平均值或不同的方法替换,PROC STDIZE可以做到这一点。
*temp is input data set name from previous step;
proc stdize data=temp out=temp_mean reponly method=mean;
*keep only records with more than 80%;
where exclude = 'N';
*list of vars to fill with mean;
VAR cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
run;
标准化的不同方法是here,但这些是标准化方法而不是插补方法。