我有一个整齐的数据格式的物种数据。要包含在报告中,我需要通过列出每个组只有一次更高的订单(王国,门,类等)来减少表的宽度。
我目前有:
...或类似的东西:
......每个更高阶只给出一次,每个物种都在下面列出的更高阶。
这个列表很长,所以需要基于脚本。我已经看过使用dplyr
,但没有看到实现这一目标的方法。
如果需要,下面是可重现的示例数据。
exampledata <- structure(list(KINGDOM = c("Animalia", "Animalia", "Animalia",
"Animalia", "Animalia", "Animalia", "Animalia", "Animalia", "Animalia",
"Animalia", "Animalia", "Animalia"), PHYLYM = c("Chordata", "Chordata",
"Chordata", "Chordata", "Chordata", "Chordata", "Chordata", "Chordata",
"Chordata", "Chordata", "Chordata", "Chordata"), CLASS = c("Amphibia",
"Amphibia", "Amphibia", "Amphibia", "Amphibia", "Aves", "Aves",
"Aves", "Aves", "Aves", "Aves", "Aves"), ORDER = c("Anura", "Anura",
"Anura", "Anura", "Anura", "Accipitriformes", "Ciconiiformes",
"Gruiformes", "Passeriformes", "Passeriformes", "Pelecaniformes",
"Pelecaniformes"), FAMILY = c("Ranidae", "Ranidae", "Rhacophoridae",
"Rhacophoridae", "Rhacophoridae", "Accipitridae", "Ciconiidae",
"Gruidae", "Muscicapidae", "Muscicapidae", "Threskiornithidae",
"Threskiornithidae"), SCIENTIFICNAME = c("Hylarana attigua",
"Hylarana taipehensis", "Philautus", "Polypedates leucomystax",
"Theloderma asperum", "Aviceda jerdoni", "Leptoptilos javanicus",
"Antigone antigone", "Cyanoptila cyanomelana", "Cyornis hainanus",
"Pseudibis davisoni", "Thaumatibis gigantea"), OTHERDATA = c("XYZ",
"ABC", "XYZ", "ABC", "XYZ", "XYZ", "ABC", "XYZ", "ABC", "ABC",
"XYZ", "XYZ")), row.names = c(NA, 12L), class = "data.frame")
删除数据通常是一个坏主意,但我看到了用例。
如果您的数据已按正确顺序排列,则可以执行以下操作:
iris %>%
mutate(Species = if_else(duplicated(Species),"", as.character(Species)))
请注意,仅需要as.character()
,因为Species
是此数据集中的一个因子。
编辑例如数据:
exampledata %>%
mutate_at(vars("KINGDOM", "PHYLYM", "CLASS","ORDER", "FAMILY", "SCIENTIFICNAME"), ~ if_else(duplicated(.x),"", as.character(.x)) )
给出一个这样的表格:
KINGDOM PHYLYM CLASS ORDER FAMILY SCIENTIFICNAME OTHERDATA
1 Animalia Chordata Amphibia Anura Ranidae Hylarana attigua XYZ
2 Hylarana taipehensis ABC
3 Rhacophoridae Philautus XYZ
4 Polypedates leucomystax ABC
5 Theloderma asperum XYZ
6 Aves Accipitriformes Accipitridae Aviceda jerdoni XYZ
7 Ciconiiformes Ciconiidae Leptoptilos javanicus ABC
8 Gruiformes Gruidae Antigone antigone XYZ
9 Passeriformes Muscicapidae Cyanoptila cyanomelana ABC
10 Cyornis hainanus ABC
11 Pelecaniformes Threskiornithidae Pseudibis davisoni XYZ
12 Thaumatibis gigantea XYZ
如果你想减少数据,我会建议group_by
更高的订单,并将其他细节保存为逗号分隔的字符串,而不是有空格。
library(dplyr)
exampledata %>%
group_by(KINGDOM, PHYLYM, CLASS, ORDER, FAMILY) %>%
summarise_at(vars(SCIENTIFICNAME, OTHERDATA), toString)
# KINGDOM PHYLYM CLASS ORDER FAMILY SCIENTIFICNAME OTHERDATA
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 Animalia Chordata Amphibia Anura Ranidae Hylarana attigua, Hylarana taipehensis XYZ, ABC
#2 Animalia Chordata Amphibia Anura Rhacophoridae Philautus, Polypedates leucomystax, Theloderm… XYZ, ABC, X…
#3 Animalia Chordata Aves Accipitrifor… Accipitridae Aviceda jerdoni XYZ
#4 Animalia Chordata Aves Ciconiiformes Ciconiidae Leptoptilos javanicus ABC
#5 Animalia Chordata Aves Gruiformes Gruidae Antigone antigone XYZ
#6 Animalia Chordata Aves Passeriformes Muscicapidae Cyanoptila cyanomelana, Cyornis hainanus ABC, ABC
#7 Animalia Chordata Aves Pelecaniform… Threskiornith… Pseudibis davisoni, Thaumatibis gigantea XYZ, XYZ
使用此方法,您不会丢失任何信息,也会减少数据框中的行数。您可以根据自己的喜好在group_by
和summarise_at
中添加/删除列。