如何使用NaN将合并的Excel单元格读入Pandas DataFrame

问题描述 投票:7回答:2

我想将Excel表格读入Pandas DataFrame。但是,有合并的Excel单元格和Null行(填充完整/部分NaN),如下所示。为了澄清,John H.已经下令购买从“The Bodyguard”到“Red Pill Blues”的所有专辑。

Excel sheet capture

当我将此Excel工作表读入Pandas DataFrame时,Excel数据无法正确传输。熊猫将合并的细胞视为一个细胞。 DataFrame如下所示:(注意:()中的值是我想要的值所需的值)

Dataframe capture

请注意,最后一行不包含合并的单元格;它只携带Artist列的值。


EDIT: I did try the following to forward-fill in the NaN values:(Pandas: Reading Excel with merged cells)
df.index = pd.Series(df.index).fillna(method='ffill')  

但是,NaN值仍然存在。我可以使用什么策略或方法正确填充DataFrame?是否有一个Pandas方法来取消合并细胞并复制相应的内容?

python excel python-3.x pandas
2个回答
4
投票

您尝试转发所需的引用链接仅填充索引列。对于您的用例,您需要为所有数据帧列fillna。因此,只需转发填充整个数据帧:

df = pd.read_excel("Input.xlsx")
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       NaN           NaN              Lemonade          Beyonce       1.0
# 3       NaN           NaN  The Thrill Of It All        Sam Smith       2.0
# 4       NaN           NaN              Thriller  Michael Jackson      11.0
# 5       NaN           NaN                Divide       Ed Sheeran       4.0
# 6       NaN           NaN            Reputation     Taylor Swift       3.0
# 7       NaN           NaN        Red Pill Blues         Maroon 5       5.0

df = df.fillna(method='ffill')
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       1.0       John H.              Lemonade          Beyonce       1.0
# 3       1.0       John H.  The Thrill Of It All        Sam Smith       2.0
# 4       1.0       John H.              Thriller  Michael Jackson      11.0
# 5       1.0       John H.                Divide       Ed Sheeran       4.0
# 6       1.0       John H.            Reputation     Taylor Swift       3.0
# 7       1.0       John H.        Red Pill Blues         Maroon 5       5.0

0
投票

使用条件:

import pandas as pd

df_excel = pd.ExcelFile('Sales.xlsx')
df = df_excel.parse('Info')

for col in list(df):  # All columns
    pprow = 0
    prow = 1
    for row in df[1:].iterrows():  # All rows, except first
        if pd.isnull(df.loc[prow, 'Album Name']):  # If this cell is empty all in the same row too.
            continue
        elif pd.isnull(df.loc[prow, col]) and pd.isnull(df.loc[row[0], col]):  # If a cell and next one are empty, take previous valor. 
            df.loc[prow, col] = df.loc[pprow, col]
        pprow = prow
        prow = row[0]

输出(我用不同的名字):

    Order_ID Customer_name    Album Name
0        NaN           NaN         Radio
1        1.0          John            a 
2        1.0          John             b
3        1.0          John             c
4        1.0          John             d
5        1.0          John             e
6        1.0          John             f
7        NaN           NaN            GE
8        2.0         Harry   We are Born
9        3.0        Lizzy        Relapse
10       4.0           Abe         Smoke
11       4.0           Abe       Tell me
12       NaN           NaN           NaN
13       NaN           NaN      Best Buy
14       5.0        Kristy      The wall
15       6.0         Sammy  Kind of blue
© www.soinside.com 2019 - 2024. All rights reserved.