熊猫从长到宽,同时保留现有的柱子?

问题描述 投票:0回答:2

我正在尝试操作 Pandas 中的数据框并遇到一些问题。我查看了这里提出的问题的一些变体,其中大多数涉及使用数据透视并丢弃一些现有的列,我想知道是否有办法解决这个问题。

我创建了一些简单的数据作为说明,与我现有的数据类似:

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
              'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

这会生成一个如下所示的数据框:

输出帧 我想做的是将月份转换为列,同时保留其他数据。所以像这样:DesiredFrame

我已经尝试过从这里开始的建议:通过两个变量将熊猫从长到宽重塑

我尝试以月份为中心:

frame.pivot(columns = 'Month')

失败1

我尝试添加更多列以查看是否可以清理:

frame.pivot(columns = ('FirstName', 'LastName','Month'), values = 'Sales' )

失败2

在这两种情况下,我都得到了一些奇怪的专栏。我很好奇 Pandas 在这里做什么,但我不知道如何理解这一点。

我想我可以循环并重新创建数据,但我认为这一定是更好的方法?

python pandas
2个回答
4
投票

事实上,你几乎已经和

pivot()
一起到达那里了。指定
index
将带您几乎一路到达那里:

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
            'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = frame.pivot(
    index=["FirstName", "LastName", "Building"],
    columns="Month",
    values="Sales",
)

df

唯一的区别是您的数据框中将有一个多级索引。如果您想准确获得所需的输出,您需要折叠多重索引并重命名索引(您也可以链接它们)

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
            'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = (
    frame.pivot(
        index=["FirstName", "LastName", "Building"],
        columns="Month",
        values="Sales"
    )
    .reset_index()              # collapses multi-index
    .rename_axis(None, axis=1)  # renames index
)

df

0
投票

我赞成Murilo Cunha上面的答案

如果您有更大的 DataFrame 并且希望对单列进行更通用的答案以使其变宽,您可以对 Murilo 的答案进行以下修改,以便枢轴索引覆盖所有其他列,而不必按名称指定它们:

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2", "Building1","Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
            'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

col_to_wide = "Month"
vals = "Sales"
# keep all other columns
keep_cols = [col for col in frame.columns if col not in [col_to_wide, vals]]

df = (
      frame.pivot(
                  index=keep_cols,
                  columns=col_to_wide,
                  values=vals
     )
     .reset_index()              # collapses multi-index
     .rename_axis(None, axis=1)  # renames index
)

df
© www.soinside.com 2019 - 2024. All rights reserved.