按帧分组的熊猫图

Question

我有以下使用从 Apple Health 导出的数据的代码。通过将 Apple Health 数据导出到

export.zip

文件来获取数据，然后您将在代码中看到我正在提取

apple_health_export/export.xml

文件并将其导入为

DataFrame

。

import zipfile
import pandas
import matplotlib.pyplot as plt
import numpy
with zipfile.ZipFile('/Users/steven/Downloads/export.zip') as myzip:
    with myzip.open('apple_health_export/export.xml') as myfile:
        x = pandas.read_xml(myfile, xpath='//Record', attrs_only=True,parse_dates=["creationDate","startDate","endDate"])
x.value=pandas.to_numeric(x.value,errors='coerce')
x = x[x.value.notnull()]
data = x[x.type == 'HKQuantityTypeIdentifierStepCount']
plt.figure()
plt.rcParams.update({'font.family':'Avenir'})
data.plot(title='Daily Steps',grid=True,x="endDate",y="value",kind="scatter",fontsize="8",figsize=(11,5),xlim=(pandas.to_datetime('2024-08-01'),pandas.to_datetime('today')))

绘制各个数据点很容易，如上面最后一行所示，但在尝试按天分组绘制它们时遇到了问题：

df = data.groupby(pandas.Grouper(key='endDate', axis=0, freq='D')).sum('value')

                             value
endDate                           
2023-11-01 00:00:00-04:00   6284.0
2023-11-02 00:00:00-04:00   3477.0
2023-11-03 00:00:00-04:00    522.0
2023-11-04 00:00:00-04:00    760.0
2023-11-05 00:00:00-04:00  14220.0
...                            ...
2024-09-07 00:00:00-04:00    916.0
2024-09-08 00:00:00-04:00   5981.0
2024-09-09 00:00:00-04:00   1012.0
2024-09-10 00:00:00-04:00  14018.0
2024-09-11 00:00:00-04:00    298.0

[316 rows x 1 columns]

如果我尝试

data.groupby(pandas.Grouper(key='endDate', axis=0, freq='D')).plot('value')

，我会得到每天的图表，而不是沿着x轴显示每天数据点的单个图表。

算我一个笨蛋，但是我如何将这些分组数据放入single图表中（不将所有这些数据拉入数据库并使用 SQL

GROUPBY

拉取，因为我试图避免额外的步骤）？

Answer 1

你还需要聚合，然后你可以绘制：

(data.groupby(pd.Grouper(key='endDate', axis=0, freq='D')).sum('value')
     .plot()
)

输出：

如果您想基于组拥有多行（例如每个月一行，该月的天作为 X 轴），您首先需要重新处理您的数据集（例如使用

pivot

）：

(data.assign(period=data['endDate'].dt.to_period('M'),
             day=data['endDate'].dt.day,
            )
     .pivot_table(index='day', columns='period', values='value',
                  aggfunc='sum')
     .plot() 
)

或使用

seaborn

:

import seaborn as sns

sns.lineplot(data.assign(period=data['endDate'].dt.to_period('M'),
                         day=data['endDate'].dt.day),
             x='day', hue='period', y='value')

输出：

按帧分组的熊猫图

问题描述投票：0回答：1

1个回答

最新问题

按帧分组的熊猫图

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1