Python Pandas 根据前缀匹配字符串

问题描述 投票:0回答:1

我在下面有一个代码,我使用

pd.read_csv
解析主机名文本文件,并根据
prefix
进行匹配,效果很好。但是,现在有一个要求,在
sj12
中我需要查找第四个字符作为字母表,例如 sj12 应匹配
sh12[a-z]
sj12a001
sj12u003
等。

我正在寻找 Pandas 是否有办法做到这一点:

#!/grid/common/pkgs/python/v3.6.1/bin/python3
import pandas as pd
import numpy as np

prefixes = ['sj00', 'sj12', 'cr00', 'cr08', 'eu00', 'eu50']

df = pd.read_csv('new_hosts', index_col=False, header=None)
df['prefix'] = df[0].str[:4]
df['grp'] = df.groupby('prefix').cumcount()
df = df.pivot(index='grp', columns='prefix', values=0)

#To drop if all values in the row are nan
df = df[ prefixes ].dropna(axis=0, how='all').replace(np.nan, '', regex=True)
df = df.rename_axis(None)

上述代码的当前输出

sj00        sj12        cr00        cr08        eu00        eu50
sj000001    sj124000    cr000011    crn00001    euk000011   eu5000011
sj000002    sj125000    cr000012    crn00002    eu0000012   eu5000013
sj000003    sj12at00    cr000013    crn00003    eu0000013   eu5000014
sj000004    sj12bt00    cr000014    crn00004    eu0000014   eu5000015

预期产量

    sj00        sj12        cr00        cr08        eu00        eu50
    sj000001    sj12at00    cr000011    crn00001    euk000011   eu5000011
    sj000002    sj12bt00    cr000012    crn00002    eu0000012   eu5000013
    sj000003                cr000013    crn00003    eu0000013   eu5000014
    sj000004                cr000014    crn00004    eu0000014   eu5000015

在上面的预期输出中,您会看到

sj124000
sj125000
已删除。

python-3.x pandas group-by
1个回答
0
投票

我用

str.extract
方法解决了。

df['sj12'] = df['sj12'].str.extract('(\w\w\d\d\w\*)', expand=True)

df['sj12'] = df['sj12'].str.extract('(\w{2}\d{2}\w\*)', expand=True)
© www.soinside.com 2019 - 2024. All rights reserved.