如果我在带有公式的数据框架上使用聚合,即。
aggregate(cbind(a,b,c)~d+e+f,df,sum)
模型列(d,e,f)的顺序是否重要?
我有这个问题,因为我在其他网站上看过一些内容,说明订单在Reshape中很重要。
* cast的基本参数是熔融数据和形式x1 + x2~y1 + y2的公式。变量的顺序很重要,第一个变化最慢,最后变速最快。
https://tgmstat.wordpress.com/2013/10/31/reshape-and-aggregate-data-with-the-r-package-reshape2/
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15116
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15004
看起来在aggregate.data.frame上有一个bug:
我使用第一个链接中提供的RData来重现R-2.15.3的问题
通过改变变量的顺序,输出会有所不同
> str(data_sample)
'data.frame': 3 obs. of 14 variables:
$ status : Factor w/ 9 levels "200","302","303",..: 2 2 2
$ X.videoplayer : Factor w/ 169 levels "","1154414614001",..: 69 72 72
$ account_id : Factor w/ 7 levels "","1661991833001",..: 1 1 1
$ accountid : Factor w/ 118 levels "","1012353585001",..: 1 1 1
$ aifp : Factor w/ 2 levels "","v0002": 1 1 1
$ allowfullscreen: Factor w/ 2 levels "","false": 2 2 2
$ and_tags : Factor w/ 322 levels "","modelid:103",..: 1 1 1
$ assetid : Factor w/ 22 levels "","1880605246001",..: 1 1 1
$ audioonly : Factor w/ 2 levels "","false": 1 1 1
$ auth : Factor w/ 3 levels "","daeagc2c8codkdja2cucodma8c2boahcecq-bqratw-bwg-acbqrwnkhhhb",..: 1 1 1
$ bclid : Factor w/ 15 levels "","1151559591",..: 1 1 1
$ block : Factor w/ 2 levels "","true": 1 1 1
$ bytes_sent : Factor w/ 10084 levels "1000","10007",..: 9660 9692 9660
$ leafcount : num 1 1 1
> aggregate(cbind(leafcount)~status+X.videoplayer+account_id+accountid+aifp+allowfullscreen+and_tags+assetid+audioonly+auth+bclid+bytes_sent+block,data_sample,sum)
status X.videoplayer account_id accountid aifp allowfullscreen and_tags assetid audioonly auth bclid bytes_sent block leafcount
1 302 1874470124001 false 758 1
2 302 1882714731 false 758 1
3 302 1882714731 false 772 1
> aggregate(cbind(leafcount)~status+X.videoplayer+account_id+accountid+aifp+allowfullscreen+and_tags+assetid+audioonly+auth+bclid+block+bytes_sent,data_sample,sum)
Error in `[[<-.data.frame`(`*tmp*`, len + i, value = c(2, 1)) :
replacement has 2 rows, data has 3
它会影响行和列的顺序,如下例所示,使用内置的CO2
数据框。从下面的输出中我们可以看到公式右侧指定的列按公式中给出的顺序输出,并且行以反向里程表顺序输出,第一列变化最快。
levels(CO2$Type)
## [1] "Quebec" "Mississippi"
levels(CO2$Treatment)
## [1] "nonchilled" "chilled"
aggregate(cbind(conc, uptake) ~ Treatment + Type, CO2, mean)
## Treatment Type conc uptake
## 1 nonchilled Quebec 435 35.33333
## 2 chilled Quebec 435 31.75238
## 3 nonchilled Mississippi 435 25.95238
## 4 chilled Mississippi 435 15.81429
aggregate(cbind(conc, uptake) ~ Type + Treatment, CO2, mean)
## Type Treatment conc uptake
## 1 Quebec nonchilled 435 35.33333
## 2 Mississippi nonchilled 435 25.95238
## 3 Quebec chilled 435 31.75238
## 4 Mississippi chilled 435 15.81429