dplyr/data.table：如何计算包含 R 中因子的两个向量每组观察计数的平均值

Question

问题

我想对我的数据进行卡方检验，以探讨国家之间

whistle types (A-F)

之间是否存在统计差异

'France' and 'Germany'

（

参见下面的可重复数据框）。

我想制作一个与

Whistle Type Count table

完全相同的表格（见下文），但我想使用以下任一方法计算每个国家每种口哨类型（A-F）的平均值，而不是每种口哨类型的观察计数包

dplyr

或

data.table

。向量

Country

和

Whistle Type

 都是因子。

对于此分析，我如何计算出如何计算每个国家每种口哨类型的平均值？

口哨类型计数表

library(dplyr)

#Count the number of Whistle types per country
Count_Whistle.type <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country, sort=TRUE)

Counts of Observations per whistle type 

   Whistle_Type Country  n
1             F Germany 38
2             E  France 37
3             B  France 33
4             C Germany 33
5             B Germany 31
6             A  France 27
7             F  France 27
8             A Germany 25
9             D  France 23
10            C  France 21
11            D Germany 21
12            E Germany 19

我需要的是这样的东西（这些平均值是组成的），显示每个国家每种口哨类型的平均值：

   Whistle_Type Country  Mean
1             F Germany 14.9
2             E  France 12.4
3             B  France  9.6

可重现的 R 代码

#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1

#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)

#Rename the column f2
colnames(f2)<-"Country"

#How many rows
nrow(f2)

#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")

#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
                     'E', 'F'), 335, replace=TRUE)

#Create random numbers
Start.Freq<-runif(335, min=1.195110e+02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e+02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e+02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e+03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220+02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875+03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)

#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq,  Low.Freq, High.Freq, Peak.Freq,  Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)

#Rename the columns 
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq', 
                  'Delta.Freq', 'Delta.Time',"Whistle_Type")

#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
Whistle_Parameters

Answer 1

library(data.table)
setDT(Whistle_Parameters)[, lapply(.SD, mean, na.rm=T), .(Country, Whistle_Type)]

或

library(dplyr)
Whistle_Parameters %>% 
  group_by(Country, Whistle_Type) %>% 
  summarize(across(everything(), mean))

Answer 2

与

data.table

：

setDT(Whistle_Parameters)
# vector of variable to average
var_to_mean <- setdiff(names(Whistle_Parameters),c("Country","Whistle_Type"))
# mean per Country and Whistle_type, for each column
Whistle_Parameters[,lapply(.SD,mean),.SDcols = var_to_mean,by = .(Country,Whistle_Type)]

   Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Center.Freq Delta.Freq Delta.Time
 1:  France            B 12518.44  31489.32  11258.852 20716.42  17208.48    1.527760   12618.33   1.851037
 2:  France            C 10843.06  36625.38  10069.604 18183.59  18561.31    1.628858   15495.80   1.485466
 3:  France            F 11232.58  31955.38  10833.592 20992.41  19007.11    1.562517   17165.26   1.729563
 4:  France            E 10418.82  34813.67  11325.300 15257.26  17749.05    1.614468   18908.37   1.549006
 5:  France            D 14526.32  29232.38  10738.944 16636.51  16756.75    2.004199   14306.67   1.608868
 6:  France            A 14378.98  37006.09  10261.817 16479.82  18099.49    1.467998   12310.01   1.538442
 7: Germany            C 10866.43  40694.70  10420.398 16017.48  17431.95    1.721555   14245.38   1.195733
 8: Germany            E 14830.39  38750.77  10313.795 19415.02  20879.32    1.808080   17165.47   1.207900
 9: Germany            F 11270.00  27803.95  10173.037 19908.47  18155.31    1.375614   13727.61   1.528704
10: Germany            B 12186.73  30519.04   9743.094 17460.80  19031.94    1.700328   17530.92   1.782998
11: Germany            A 11639.71  34723.41  10769.073 18420.37  13990.86    1.485000   15683.72   1.488470
12: Germany            D 12461.53  41251.08   8739.983 21661.02  15025.17    1.515318   12918.67   1.516498

与

dplyr

：


library(dplyr)

Whistle_Parameters %>%
  group_by(Country,Whistle_Type) %>%
  summarise(across(var_to_mean,mean))

`summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
# A tibble: 12 x 10
# Groups:   Country [2]
   Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Cente~1 Delta~2 Delta~3
   <fct>   <chr>           <dbl>     <dbl>      <dbl>    <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
 1 France  A              14379.    37006.     10262.   16480.    18099.    1.47  12310.    1.54
 2 France  B              12518.    31489.     11259.   20716.    17208.    1.53  12618.    1.85
 3 France  C              10843.    36625.     10070.   18184.    18561.    1.63  15496.    1.49
 4 France  D              14526.    29232.     10739.   16637.    16757.    2.00  14307.    1.61
 5 France  E              10419.    34814.     11325.   15257.    17749.    1.61  18908.    1.55
 6 France  F              11233.    31955.     10834.   20992.    19007.    1.56  17165.    1.73
 7 Germany A              11640.    34723.     10769.   18420.    13991.    1.49  15684.    1.49
 8 Germany B              12187.    30519.      9743.   17461.    19032.    1.70  17531.    1.78
 9 Germany C              10866.    40695.     10420.   16017.    17432.    1.72  14245.    1.20
10 Germany D              12462.    41251.      8740.   21661.    15025.    1.52  12919.    1.52
11 Germany E              14830.    38751.     10314.   19415.    20879.    1.81  17165.    1.21
12 Germany F              11270.    27804.     10173.   19908.    18155.    1.38  13728.    1.53

dplyr/data.table：如何计算包含 R 中因子的两个向量每组观察计数的平均值

问题描述投票：0回答：2

2个回答

最新问题

dplyr/data.table：如何计算包含 R 中因子的两个向量每组观察计数的平均值

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2