如何合并具有动态列数的列

问题描述 投票:0回答:1

我正在从事一个AI项目,该项目涉及在Python中处理大量数据帧。我试图将值追加到df,但是,我想根据数据帧df的列数使a的列数动态化。 rowMerge是一个接受两个变量(ab)的函数。 a是我们提供的数据框架,b是我们希望函数返回的数据框架。此功能使我可以在a具有五列时合并行。

def rowMerger(a,b):
try:
    b = pd.DataFrame(data=None, columns =[f'Column{i}' for i in range(0, len(a.columns))])
    rule1 = lambda x: x not in ['']
    u = a.loc[a['Column0'].apply(rule1) & a['Column1'].apply(rule1) & a['Column2'].apply(rule1)].index
    findMergerindexs = list(u)
    findMergerindexs.sort()
    a = pd.DataFrame(a)
    if (len(findMergerindexs) > 0):
       for m in range(len(findMergerindexs)):
           if not (m == (len(findMergerindexs)-1)): 
               startLoop = findMergerindexs[m]
               endLoop = findMergerindexs[m+1]
           else:
               startLoop = findMergerindexs[m]
               endLoop = len(a)
           Column0 = ''
           Column1 = ''
           Column2 = ''
           Column3 = ''
           Column4 = ''
           for n in range(startLoop,endLoop):
               Column0 = Column0 + str(a.iloc[n,0])
               Column1 = Column1 + str(a.iloc[n,1])
               Column2 = Column2 + str(a.iloc[n,2])
               Column3 = Column3 + str(a.iloc[n,3])
               Column4 = Column4 + str(a.iloc[n,4])
           b = b.append({'Column0': Column0.strip(), 'Column1': Column1.strip(), 'Column2': Column2.strip(), 'Column3': Column3.strip(), 'Column4': Column4.strip()}, ignore_index=True)
    else:
        print("File is not having a row for merging instances - Please check the file manually for instance - ")
except: 
    print("Error - While merging the rows")
return b

我上面的函数是我合并行的函数,以便可以消除行之间的空格。例如,我有一个数据框,如下所示。

    df=[['7','4','5','7','8'],["","","",'7','4'],['9','4','7','8','4'],["","","",'7','5'],['4','8','5','4','6']]
df=pd.DataFrame(df)
df.columns=[f'Column{i}' for i in range(0, len(df.columns))]



Column0 Column1 Column2 Column3 Column4
7       4       5       7       8 
                        7       4
9       4       7       8       4
                        7       5
4       8       5       4       6

并且函数rowMerger删除了行之间的空格,并为我提供了如下所示的数据框。

rowMerger(df,0)
    Column1 Column2 Column3 Column4 Column5
    7       4       5       77       84
    9       4       7       87       45
    4       8       5       4         6

但是,此功能不是动态的。即,变量b的列数是手动确定的。相反,我想根据变量a的列数使函数内部生成的列数动态化。例如,如果a的列数为三,我想创建三列(Column0Column0Column0)并将值附加到这些列并返回具有三列的数据帧。

我已经尽力了,但是这超出了我的能力。我仍在学习python,如果有人可以帮助我,我将非常感谢。

python pandas append conda
1个回答
0
投票

这里有个功能可以帮助您;它适用于您提供的示例,但是您必须对其进行调整以适应许多其他情况:想法是找到具有空字符串的行,获取这些行的列,将它们组合并以某种方式将它们传递回原始数据框。我在代码中添加了注释;希望他们应该解释得很好。莱姆知道情况如何。其他人可能会有更好的选择,所以只需玩一玩,然后c。

 def process_data(df):

    #convert to string
    #easier to merge rows
    df = df.astype(str)

    #find rows where there are empty strings
    empty_rows_index = df.loc[df.eq('').any(axis=1)].index

    #find columns where there are no empty strings
    non_empty_cols = df.loc[:,df.ne('').all()].columns.tolist()

    #this gets us the index above the rows with empty strings
    empty_rows_pair = [[ind-1,ind] for ind in empty_rows_index]

    #pair index with columns
    rows_cols = [[entry,non_empty_cols] for entry in empty_rows_pair]

    #this combines the columns where empty strings are in the next row
    #with the non empty string row in the previous column
    lump = [df.loc[x,y].sum().astype('int') for x,y in rows_cols]

    #combine and flip, so that the column names are the headers
    merger = pd.concat(lump,axis=1).T

    #to ensure complete reintegration back to the dataframe
    #set the merger index to the previous row index
    merger.index = [i for i,j in empty_rows_pair]

    #drop the empty string rows
    df = df.drop(empty_rows_index)

    #set the rows in df to match with
    #the rows and columns in merger
    #and assign merger to that section
    df.loc[merger.index,merger.columns] = merger

    df = df.astype(int).reset_index(drop=True)
    return df

    process_data(df)

    Column0 Column1 Column2 Column3 Column4
0       7      4       5      77     84
1       9      4       7      87     45
2       4      8       5      4      6
© www.soinside.com 2019 - 2024. All rights reserved.