我有一个如下所示的数据框:
我需要从 CityIds 列中删除少于 4 个字符的项目。逗号后可以有空格,因为 Items 下有数千个元素。
CityIds |
---|
98765, 98-oki, th6, iuy89, 8.90765 |
89ol, gh98.0p, klopi, th, loip |
98087,PAKJIYT, hju, yu8oi, iupli |
例如:我想去掉th6或者在一个单独的列中显示th6
提取并连接回那些长度等于或大于
4
的所需项目:
df['CityIds'] = df['CityIds'].str.findall(r'([^\s,]{4,})').str.join(', ')
CityIds
0 98765, 98-oki, iuy89, 8.90765
1 89ol, gh98.0p, klopi, loip
2 98087, PAKJIYT, yu8oi, iupli
上面的答案显然更干净;但是,我在这里为排除的 ID 添加了一个新列:
import pandas as pd
d = {'cityIDs': ['98765, 98-oki, th6, iuy89, 8.90765',
'89ol, gh98.0p, klopi, th, loip',
'98087, PAKJIYT, hju, yu8oi, iupli']}
df = pd.DataFrame(data=d)
n = len(df['cityIDs'])
df['rmvdIDs'] = ['' for _ in range(n)]
for i in range(n):
row = df['cityIDs'][i]
cityIDs = "".join(row.split()).split(',')
new_IDs = [i for i in cityIDs if len(i) >= 4]
excl_IDs = list(set(cityIDs) - set(new_IDs))
new_row = ", ".join(new_IDs)
excl_row = ", ".join(excl_IDs)
df['cityIDs'][i] = new_row
df['rmvdIDs'][i] = excl_row
print(df)
会回来:
cityIDs rmvdIDs
0 98765, 98-oki, iuy89, 8.90765 th6
1 89ol, gh98.0p, klopi, loip th
2 98087, PAKJIYT, yu8oi, iupli hju
-- 希望这有帮助