我有一个大约10,000个值的DataFrame,如下所示:
+------------+
| id |
+------------+
| 12-4253 |
+------------+
| 24-3521-01 |
+------------+
| 46-745 |
+------------+
| 13-2131-02 |
+------------+
我希望能够检查单元格中是否存在两个破折号,然后删除第二个破折号和值,最后得到:
+-----------+
| id |
+-----------+
| 12-4253 |
+-----------+
| 24-3521 |
+-----------+
| 46-745 |
+-----------+
| 13-2131 |
+-----------+
由于检查子字符串不能真正检查子字符串的倍数,我想我会做以下事情:
i = 0
for item in DF:
item = str(item) # Had to put this because of an issue where floats can't be sub-stringed?
lastThree = item[-3:]
if "-" in lastThree:
correctItem = item[:-3]
DF.set_value(i, 'id', correctItem)
i+=1
但这似乎不起作用......
任何人都可以指导我找到一个更优雅和文明的解决方案吗?是否将最后3个值转换为浮点数,这就是为什么它找不到连字符?
谢谢!
使用pd.Series.split
df['id'].str.split('-', 2).str[:2].str.join('-').to_frame()
id
0 12-4253
1 24-3521
2 46-745
3 13-2131
你可以使用extract
:
df = df['id'].str.extract('^([\d+]+-[\d+]+)', expand=False)
print (df)
0 12-4253
1 24-3521
2 46-745
3 13-2131
Name: id, dtype: object