我有一个包含2列的样本数据集:Dates和eVal,如下所示:
eVal Dates
0 3.622833 2015-01-01
1 3.501333 2015-01-01
2 3.469167 2015-01-01
3 3.436333 2015-01-01
4 3.428000 2015-01-01
5 3.400667 2015-01-01
6 3.405667 2015-01-01
7 3.401500 2015-01-01
8 3.404333 2015-01-01
9 3.424833 2015-01-01
10 3.489500 2015-01-01
11 3.521000 2015-01-01
12 3.527833 2015-01-01
13 3.523500 2015-01-01
14 3.511667 2015-01-01
15 3.602500 2015-01-01
16 3.657667 2015-01-01
17 3.616667 2015-01-01
18 3.534500 2015-01-01
19 3.529167 2015-01-01
20 3.548167 2015-01-01
21 3.565500 2015-01-01
22 3.539833 2015-01-01
23 3.485667 2015-01-01
24 3.493167 2015-01-02
25 3.434667 2015-01-02
26 3.422500 2015-01-02
... ...
3304546 3.166000 2015-01-31
3304547 3.138500 2015-01-31
3304548 3.128000 2015-01-31
3304549 3.078833 2015-01-31
3304550 3.106000 2015-01-31
3304551 3.116167 2015-01-31
3304552 3.087500 2015-01-31
3304553 3.089167 2015-01-31
3304554 3.126667 2015-01-31
3304555 3.191667 2015-01-31
3304556 3.227500 2015-01-31
3304557 3.263833 2015-01-31
3304558 3.263667 2015-01-31
3304559 3.255333 2015-01-31
3304560 3.265500 2015-01-31
3304561 3.234167 2015-01-31
3304562 3.231167 2015-01-31
3304563 3.236333 2015-01-31
3304564 3.274667 2015-01-31
3304565 3.223167 2015-01-31
3304566 3.238333 2015-01-31
3304567 3.235000 2015-01-31
3304568 3.227333 2015-01-31
3304569 3.185333 2015-01-31
我希望按天聚合并为每一天做一个平均值(列eVal)。我试着用:
me = time['eVal'].groupby(time['Dates']).mean()
但它返回错误的平均值:
me.head(10)
Out[149]:
Dates
2015-01-01 4.014973
2015-01-02 4.006548
2015-01-03 4.010406
2015-01-04 4.034531
2015-01-05 3.988262
2015-01-06 3.972111
2015-01-07 3.989347
2015-01-08 3.959556
2015-01-09 3.995394
2015-01-10 4.048786
Name: eVal, dtype: float64
如果我在groupby上应用了一个describe,则groupby组不正确。各个日子的平均值的最大值和最小值是错误的。
你可以使用下面的代码行。
time.groupby( '日期')。意思是()
我在您的样品上试过这个,下面是样品输出。
eVal Dates
2015-01-01 3.506160
2015-01-02 3.450111