我有以下包含date
的数据框是扭曲的。
index Date Particulars
0 01-12- AVON AGRO
1 2018 NaN
2 01-12- CASH
3 2018 NaN
4 03-12- NEFTOut/UTBIN18337459966/LUNI
5 2018 A MARKETING/SBIN00019
6 03-12- ANJANI TRADERS
7 2018 NaN
8 03-12- NEFTOut/UTBIN18337484160/BIGS
9 2018 MILE PRODUCTS/UTIB000
但是我想要以下输出:
index Date Particulars
0 01-12-2018 AVON AGRO
2 01-12-2018 CASH
4 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN00019
6 03-12-2018 ANJANI TRADERS
8 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTIB000
我尝试过df.apply(lambda x: x if re.search('\d{4}$', str(x)) else str(x.shift(-1)) + str(x))
,但它给了我:
Date 0 2018\n1 01-12-\n2 2018...
Particulars 0 NaN\n1 ...
dtype: object
首先将缺失值替换为空字符串,然后将成对和成对的行与groupby
和join
结合在一起:
df1 = df.fillna('').groupby(df.index // 2).agg(''.join)
print (df1)
Date Particulars
index
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
或选择按位置配对和取消配对:
df1 = df.fillna('')
df1 = df1.iloc[::2].reset_index(drop=True) + df1.iloc[1::2].reset_index(drop=True)
print (df1)
Date Particulars
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
也可以使用正则表达式解决方案:
df1 = df.fillna('')
m = df1['Date'].str.contains('\d{4}$')
df1 = df1[m.shift(-1).fillna(False)].reset_index(drop=True) + df1[m].reset_index(drop=True)