使用 Pandas 数据框中的数据填充表格

Question

我必须使用每周的报告填写下表。我没有手动执行此操作，而是尝试通过编写几行代码来自动化此过程。

我已经设法聚合多个输入文件中的日期，现在我正在努力将数字粘贴到表中。简单的复制和转置将不起作用，因为有几周没有数据。

我有这样的电子表格要填写：

|预算名称 |总和[h] | 1 | 2 | 3 |... | -------- | -------- | ------| |团队费用 | 1000 | 1000 |团队_b_费用| 1570 | 1570 ...

到目前为止，我已经成功汇总了数据：

from pathlib import Path
import pandas as pd
import numpy as np

path = r'C:\FUNDS'  # or unix / linux / mac path

# Get the files from the path provided in the OP
files = Path(path).glob('*.xlsx')  # .rglob to get subdirectories
dfs = list()
for f in files:
    data = pd.read_excel(f)
    # .stem is method for pathlib objects to get the filename w/o the extension
    data['file'] = f.stem
    dfs.append(data)

df = pd.concat(dfs, ignore_index=True)
df
grouped_series = df.groupby(['file', 'Activity week'])['Time'].sum()

#print(grouped_series)
df_grouped = grouped_series.to_frame()
# print(df_grouped)
df_grouped.to_excel("TEAM_spending.xlsx",
             sheet_name='Spending')

结果是：

预算总支出

我不知道如何根据列中的活动周和行中的预算名称迭代 C 列中的所有行并将每个值粘贴到支出电子表格中的相应单元格中。

Answer 1

看起来您已经快要完成了，只需要

.unstack

您的数据，然后对 Excel 输出应用一些格式。

import pandas as pd
from numpy.random import default_rng

rng = default_rng(0)

# starting from your "df_grouped" variable
index=pd.MultiIndex.from_product(
    [['Team_a_invest', 'Team_b_invest'], [*range(5)]],
    names=['Budget name', 'Productivity week'],
)
df_grouped = (
    pd.DataFrame(
       data={'Time': rng.normal(200, 40, size=len(index))},
       index=index,
    )
    .sample(frac=.9, random_state=rng) # introduce missing rows
    .sort_index()
)

print(
    df_grouped,
    #                                  Time
    # Budget name   Productivity week
    # Team_a_invest 0                  205.029209
    #               2                  225.616906
    #               3                  204.196005
    #               4                  178.573225
    # Team_b_invest 0                  214.463802
    #               1                  252.160002
    #               2                  237.883239
    #               3                  171.850591
    #               4                  149.383141

    (
        df_grouped['Time']
        .unstack(level='Productivity week')
        .assign(**{
            'Sum[h]': lambda d: d.sum(axis=1)
        })
        .fillna('')                                         # remove NaN values for easier copy/paste
        .loc[:, lambda d: [d.columns[-1], *d.columns[:-1]]] # put Sum[h] as first column
        .rename_axis(columns=None)                          # remove the name from the column Index (easier copy/paste)
        .reset_index()                                      # again, easier copy/paste
    ),
    #      Budget name       Sum[h]           0           1           2           3           4
    # 0  Team_a_invest   813.415345  205.029209              225.616906  204.196005  178.573225
    # 1  Team_b_invest  1025.740774  214.463802  252.160002  237.883239  171.850591  149.383141

    sep='\n\n'
)

当然，您也可以使用

.groupby(…).unstack(…)

来代替

.pivot_table

图案。

使用 Pandas 数据框中的数据填充表格

问题描述投票：0回答：1

1个回答

最新问题

使用 Pandas 数据框中的数据填充表格

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1