我想找到每个唯一“taxon_name”的“trait_name”列下每个变量的平均值(对于数字变量)和众数(对于字符变量)。然后我想将这些值制成表格。
这就是我的数据框目前的样子(我也有一个宽版本):
这是我想要的输出:
我编写了代码,手动计算每个“taxon_name”的每个“trait_name”的平均值和众数,如下所示:
# create mode function
find.mode <- function(x, na.rm = TRUE) {
if(na.rm){
x = x[!is.na(x)]
}
val <- unique(x)
return(val[which.max(tabulate(match(x, val)))])
}
# MEAN AND MODE OF TRAITS
## Acacia implexa
acacia_mass <- mean(species_traits_wide$seed_dry_mass[species_traits_wide$taxon_name == "Acacia implexa"], na.rm = TRUE)
acacia_length <- mean(species_traits_wide$seed_length[species_traits_wide$taxon_name == "Acacia implexa"], na.rm = TRUE)
acacia_form <- find.mode(species_traits_wide$plant_growth_form[species_traits_wide$taxon_name == "Acacia implexa"])
acacia_dormancy <- find.mode(species_traits_wide$seed_dormancy_class[species_traits_wide$taxon_name == "Acacia implexa"])
acacia_treatment <- find.mode(species_traits_wide$seed_germination_treatment[species_traits_wide$taxon_name == "Acacia implexa"])
但是我需要自动化数据框中所有物种/我想添加的任何新物种的过程
我的数据的长版本和宽版本都可以在这里下载:https://drive.google.com/drive/folders/15hD3Zk2DlXsvWhI-WyQYMTm49ZZB0Ip0?usp=drive_link
library(tidyverse)
df <- read_csv("species_traits_wide.csv")
df |> summarise(across(2:6, ~ ifelse(is.numeric(.), as.character(mean(., na.rm = TRUE)),get_mode(., na.rm = TRUE))), .by = taxon_name)
输出:
# A tibble: 5 × 6
taxon_name plant_growth_form seed_dormancy_class seed_dry_mass
<chr> <chr> <chr> <chr>
1 Acacia implexa tree physical_dormancy 8.967897142857…
2 Casuarina cunninghamiana tree non_dormant 0.632023076923…
3 Eucalyptus viminalis tree NA 67.32578571428…
4 Hardenbergia violacea climber physical_dormancy 18.04795959595…
5 Themeda triandra tussock NA 4.375857352941…
# ℹ 2 more variables: seed_germination_treatment <chr>, seed_length <chr>