我想按
mcode
对数据进行分组,并为每个组创建两种不同类型的行。
以下是示例数据。
Cat1 Cat2 Cat3 mcode key pcode needed
1 C1 C2 C31 B3100 TRUE P001 P001
2 C1 C2 C31 B3100 FALSE P002 P002
3 C1 C2 C31 B5500 TRUE P003 P003
4 C1 C2 C31 B5500 FALSE P004 NA
5 C1 C2 C31 B5500 FALSE P005 NA
6 C1 C2 C32 B1000 TRUE P006 NA
7 C1 C2 C32 B1000 FALSE P007 P007
8 C1 C2 C32 B1000 FALSE P008 NA
9 C1 C2 C32 B1000 FALSE P009 P009
10 C1 C2 C32 B1000 FALSE P010 P010
对于每个组,我想从
Cat1
为 Cat2
的行获取类别值 (Cat3
、key
、TRUE
)。
此外,我需要创建 Python 样式的列表字符串,分别组合
pcode
和 needed
列中的所有值,不包括 NA
值。
请注意,当
key
第一次具有不同值时,TRUE
列为 mcode
。
以下是预期输出。
mcode Cat1 Cat2 Cat3 type extended_info
1 B1000 C1 C2 C32 pcode ['P006','P007','P008','P009','P010']
2 B1000 C1 C2 C32 needed ['P007','P009','P010']
3 B3100 C1 C2 C31 pcode ['P001','P002']
4 B3100 C1 C2 C31 needed ['P001','P002']
5 B5500 C1 C2 C31 pcode ['P003','P004','P005']
6 B5500 C1 C2 C31 needed ['P003']
这里是重现数据和预期输出的 tribble
df <- tribble(
~Cat1, ~Cat2, ~Cat3, ~mcode, ~key, ~pcode, ~needed,
"C1", "C2", "C31", "B3100", TRUE, "P001", "P001",
"C1", "C2", "C31", "B3100", FALSE, "P002", "P002",
"C1", "C2", "C31", "B5500", TRUE, "P003", "P003",
"C1", "C2", "C31", "B5500", FALSE, "P004", NA,
"C1", "C2", "C31", "B5500", FALSE, "P005", NA,
"C1", "C2", "C32", "B1000", TRUE, "P006", NA,
"C1", "C2", "C32", "B1000", FALSE, "P007", "P007",
"C1", "C2", "C32", "B1000", FALSE, "P008", NA,
"C1", "C2", "C32", "B1000", FALSE, "P009", "P009",
"C1", "C2", "C32", "B1000", FALSE, "P010", "P010"
)
expected_output <- tribble(
~mcode, ~Cat1, ~Cat2, ~Cat3, ~type, ~extended_info,
"B1000", "C1", "C2", "C32", "pcode", "['P006','P007','P008','P009','P010']",
"B1000", "C1", "C2", "C32", "needed", "['P007','P009','P010']",
"B3100", "C1", "C2", "C31", "pcode", "['P001','P002']",
"B3100", "C1", "C2", "C31", "needed", "['P001','P002']",
"B5500", "C1", "C2", "C31", "pcode", "['P003','P004','P005']",
"B5500", "C1", "C2", "C31", "needed", "['P003']"
)
看起来每个
key = TRUE
只有一行带有 mcode
。
这样的东西应该能满足你的需要:
expected_output <- df %>%
summarise(Cat1 = first(Cat1[key]),
Cat2 = first(Cat2[key]),
Cat3 = first(Cat3[key]),
pcode = list(sort(unique(pcode[!is.na(pcode)]))),
needed = list(sort(unique(needed[!is.na(needed)]))),
.by = mcode) %>%
pivot_longer(cols = c(pcode, needed), names_to = "type", values_to = "extended_info") %>%
arrange(mcode)