dplyr/data.table:如何计算包含 R 中因子的两个向量每组观察计数的平均值

问题描述 投票:0回答:2

问题

我想对我的数据进行卡方检验,以探讨国家之间

whistle types (A-F)

之间是否存在统计差异
'France' and 'Germany'
参见下面的可重复数据框)。

我想制作一个与

Whistle Type Count table

完全相同的表格(见下文),但我想使用以下任一方法计算每个国家每种口哨类型(A-F)的平均值,而不是每种口哨类型的观察计数包 
dplyr
data.table
。向量 
Country
Whistle Type
 都是因子。

对于此分析,我如何计算出如何计算每个国家每种口哨类型的平均值?

口哨类型计数表

library(dplyr) #Count the number of Whistle types per country Count_Whistle.type <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country, sort=TRUE) Counts of Observations per whistle type Whistle_Type Country n 1 F Germany 38 2 E France 37 3 B France 33 4 C Germany 33 5 B Germany 31 6 A France 27 7 F France 27 8 A Germany 25 9 D France 23 10 C France 21 11 D Germany 21 12 E Germany 19
我需要的是这样的东西(这些平均值是组成的),显示每个国家每种口哨类型的平均值:

Whistle_Type Country Mean 1 F Germany 14.9 2 E France 12.4 3 B France 9.6

可重现的 R 代码

#Dummy data #Create a cluster column with dummy data (clusters = 3) f1 <- gl(n = 2, k=167.5); f1 #Produce a data frame for the dummy level data f2<-as.data.frame(f1) #Rename the column f2 colnames(f2)<-"Country" #How many rows nrow(f2) #Rename the levels of the dependent variable 'Country' as classifiers #prefer the inputs to be factors levels(f2$Country) <- c("France", "Germany") #Add a vector called Whistle Types #Add a vector called Behaviors Whistle_Types<-sample(c('A', 'B', 'C', 'D', 'E', 'F'), 335, replace=TRUE) #Create random numbers Start.Freq<-runif(335, min=1.195110e+02, max=23306.000000) End.Freq<-runif(335, min=3.750000e+02, max=65310.000000) Delta.Time<-runif(335, min=2.192504e-02, max=3.155762) Low.Freq<-runif(335, min=6.592500e+02, max=20491.803000) High.Freq<-runif(335, min=2.051000e+03, max=36388.450000) Peak.Freq<-runif(335, min=7.324220+02, max=35595.703000) Center.Freq<-runif(335, min=2.190000e-02, max=3.155800) Delta.Freq<-runif(335, min=1.171875+03, max=30761.719000) Delta.Time<-runif(335, min=2.192504e-02, max=3.155762) #Bind the columns together Bind<-cbind(f2, Start.Freq, End.Freq, Low.Freq, High.Freq, Peak.Freq, Center.Freq, Delta.Freq, Delta.Time, Whistle_Types) #Rename the columns colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq', 'Delta.Freq', 'Delta.Time',"Whistle_Type") #Produce a dataframe Whistle_Parameters<-as.data.frame(Bind) Whistle_Parameters
    
r vector dplyr data.table mean
2个回答
2
投票
library(data.table) setDT(Whistle_Parameters)[, lapply(.SD, mean, na.rm=T), .(Country, Whistle_Type)]

library(dplyr) Whistle_Parameters %>% group_by(Country, Whistle_Type) %>% summarize(across(everything(), mean))
    

2
投票

data.table

setDT(Whistle_Parameters) # vector of variable to average var_to_mean <- setdiff(names(Whistle_Parameters),c("Country","Whistle_Type")) # mean per Country and Whistle_type, for each column Whistle_Parameters[,lapply(.SD,mean),.SDcols = var_to_mean,by = .(Country,Whistle_Type)] Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Center.Freq Delta.Freq Delta.Time 1: France B 12518.44 31489.32 11258.852 20716.42 17208.48 1.527760 12618.33 1.851037 2: France C 10843.06 36625.38 10069.604 18183.59 18561.31 1.628858 15495.80 1.485466 3: France F 11232.58 31955.38 10833.592 20992.41 19007.11 1.562517 17165.26 1.729563 4: France E 10418.82 34813.67 11325.300 15257.26 17749.05 1.614468 18908.37 1.549006 5: France D 14526.32 29232.38 10738.944 16636.51 16756.75 2.004199 14306.67 1.608868 6: France A 14378.98 37006.09 10261.817 16479.82 18099.49 1.467998 12310.01 1.538442 7: Germany C 10866.43 40694.70 10420.398 16017.48 17431.95 1.721555 14245.38 1.195733 8: Germany E 14830.39 38750.77 10313.795 19415.02 20879.32 1.808080 17165.47 1.207900 9: Germany F 11270.00 27803.95 10173.037 19908.47 18155.31 1.375614 13727.61 1.528704 10: Germany B 12186.73 30519.04 9743.094 17460.80 19031.94 1.700328 17530.92 1.782998 11: Germany A 11639.71 34723.41 10769.073 18420.37 13990.86 1.485000 15683.72 1.488470 12: Germany D 12461.53 41251.08 8739.983 21661.02 15025.17 1.515318 12918.67 1.516498

dplyr

library(dplyr) Whistle_Parameters %>% group_by(Country,Whistle_Type) %>% summarise(across(var_to_mean,mean)) `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument. # A tibble: 12 x 10 # Groups: Country [2] Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Cente~1 Delta~2 Delta~3 <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 France A 14379. 37006. 10262. 16480. 18099. 1.47 12310. 1.54 2 France B 12518. 31489. 11259. 20716. 17208. 1.53 12618. 1.85 3 France C 10843. 36625. 10070. 18184. 18561. 1.63 15496. 1.49 4 France D 14526. 29232. 10739. 16637. 16757. 2.00 14307. 1.61 5 France E 10419. 34814. 11325. 15257. 17749. 1.61 18908. 1.55 6 France F 11233. 31955. 10834. 20992. 19007. 1.56 17165. 1.73 7 Germany A 11640. 34723. 10769. 18420. 13991. 1.49 15684. 1.49 8 Germany B 12187. 30519. 9743. 17461. 19032. 1.70 17531. 1.78 9 Germany C 10866. 40695. 10420. 16017. 17432. 1.72 14245. 1.20 10 Germany D 12462. 41251. 8740. 21661. 15025. 1.52 12919. 1.52 11 Germany E 14830. 38751. 10314. 19415. 20879. 1.81 17165. 1.21 12 Germany F 11270. 27804. 10173. 19908. 18155. 1.38 13728. 1.53
    
© www.soinside.com 2019 - 2024. All rights reserved.