我想知道如何跨多行汇总特定 ID 的状态。首先,我知道如何使用 BY 语句在 SAS 中执行此操作。最后。但不确定如何在 R 中实现相同的目标。
需要汇总的数据示例, 即对于每个 ID,颜色列是:“仅红色”、“仅蓝色”或“两者”
生成数据框
example <- data.frame(id = c("A1", "A1", "A1", "A2", "A3", "A3", "A4", "A4", "A4", "A5", "A5", "A6"),
colour = c("red", "red", "blue", "red", "blue", "blue", "red", "red", "red", "red", "blue", "red"))
输出表
id colour
1 A1 red
2 A1 red
3 A1 blue
4 A2 red
5 A3 blue
6 A3 blue
7 A4 red
8 A4 red
9 A4 red
10 A5 red
11 A5 blue
12 A6 red
想要的结果
id status
1 A1 both
2 A2 red only
3 A3 blue only
4 A4 red only
5 A5 both
6 A6 red only
SAS 中的等效代码为:
data table1 (keep=id status);
set example;
by id;
retain red_count blue_count;
if first.id then do;
red_count = 0;
blue_count = 0;
end;
if colour = "red" then red_count+1;
if colour = "blue" then blue_count+1;
if last.id then do;
if red_count > 0 and blue_count > 0 then status = "both";
else if red_count > 0 then status = "red only";
else if blue_count > 0 then status = "blue only";
output;
end;
run;