R中聚合函数公式中的模型顺序

问题描述 投票:0回答:2

如果我在带有公式的数据框架上使用聚合,即。

aggregate(cbind(a,b,c)~d+e+f,df,sum)

模型列(d,e,f)的顺序是否重要?

我有这个问题,因为我在其他网站上看过一些内容,说明订单在Reshape中很重要。

* cast的基本参数是熔融数据和形式x1 + x2~y1 + y2的公式。变量的顺序很重要,第一个变化最慢,最后变速最快。

https://tgmstat.wordpress.com/2013/10/31/reshape-and-aggregate-data-with-the-r-package-reshape2/

r aggregate
2个回答
0
投票

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15116

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15004

看起来在aggregate.data.frame上有一个bug:

我使用第一个链接中提供的RData来重现R-2.15.3的问题

通过改变变量的顺序,输出会有所不同

> str(data_sample)
'data.frame':   3 obs. of  14 variables:
 $ status         : Factor w/ 9 levels "200","302","303",..: 2 2 2
 $ X.videoplayer  : Factor w/ 169 levels "","1154414614001",..: 69 72 72
 $ account_id     : Factor w/ 7 levels "","1661991833001",..: 1 1 1
 $ accountid      : Factor w/ 118 levels "","1012353585001",..: 1 1 1
 $ aifp           : Factor w/ 2 levels "","v0002": 1 1 1
 $ allowfullscreen: Factor w/ 2 levels "","false": 2 2 2
 $ and_tags       : Factor w/ 322 levels "","modelid:103",..: 1 1 1
 $ assetid        : Factor w/ 22 levels "","1880605246001",..: 1 1 1
 $ audioonly      : Factor w/ 2 levels "","false": 1 1 1
 $ auth           : Factor w/ 3 levels "","daeagc2c8codkdja2cucodma8c2boahcecq-bqratw-bwg-acbqrwnkhhhb",..: 1 1 1
 $ bclid          : Factor w/ 15 levels "","1151559591",..: 1 1 1
 $ block          : Factor w/ 2 levels "","true": 1 1 1
 $ bytes_sent     : Factor w/ 10084 levels "1000","10007",..: 9660 9692 9660
 $ leafcount      : num  1 1 1
> aggregate(cbind(leafcount)~status+X.videoplayer+account_id+accountid+aifp+allowfullscreen+and_tags+assetid+audioonly+auth+bclid+bytes_sent+block,data_sample,sum)
  status X.videoplayer account_id accountid aifp allowfullscreen and_tags assetid audioonly auth bclid bytes_sent block leafcount
1    302 1874470124001                                     false                                              758               1
2    302    1882714731                                     false                                              758               1
3    302    1882714731                                     false                                              772               1
> aggregate(cbind(leafcount)~status+X.videoplayer+account_id+accountid+aifp+allowfullscreen+and_tags+assetid+audioonly+auth+bclid+block+bytes_sent,data_sample,sum)
Error in `[[<-.data.frame`(`*tmp*`, len + i, value = c(2, 1)) : 
  replacement has 2 rows, data has 3

0
投票

它会影响行和列的顺序,如下例所示,使用内置的CO2数据框。从下面的输出中我们可以看到公式右侧指定的列按公式中给出的顺序输出,并且行以反向里程表顺序输出,第一列变化最快。

levels(CO2$Type)
## [1] "Quebec"      "Mississippi"

levels(CO2$Treatment)
## [1] "nonchilled" "chilled"   

aggregate(cbind(conc, uptake) ~ Treatment + Type, CO2, mean)
##    Treatment        Type conc   uptake
## 1 nonchilled      Quebec  435 35.33333
## 2    chilled      Quebec  435 31.75238
## 3 nonchilled Mississippi  435 25.95238
## 4    chilled Mississippi  435 15.81429

aggregate(cbind(conc, uptake) ~ Type + Treatment, CO2, mean)
##          Type  Treatment conc   uptake
## 1      Quebec nonchilled  435 35.33333
## 2 Mississippi nonchilled  435 25.95238
## 3      Quebec    chilled  435 31.75238
## 4 Mississippi    chilled  435 15.81429
© www.soinside.com 2019 - 2024. All rights reserved.