根据是否存在破折号，有条件地在DataFrame中切割最后3个字符

Question

我有一个大约10,000个值的DataFrame，如下所示：

+------------+
| id         |
+------------+
| 12-4253    |
+------------+
| 24-3521-01 |
+------------+
| 46-745     |
+------------+
| 13-2131-02 |
+------------+

我希望能够检查单元格中是否存在两个破折号，然后删除第二个破折号和值，最后得到：

+-----------+
| id        |
+-----------+
| 12-4253   |
+-----------+
| 24-3521   |
+-----------+
| 46-745    |
+-----------+
| 13-2131   |
+-----------+

由于检查子字符串不能真正检查子字符串的倍数，我想我会做以下事情：

i = 0
for item in DF:
    item = str(item) # Had to put this because of an issue where floats can't be sub-stringed?
    lastThree = item[-3:]

    if "-" in lastThree:
        correctItem = item[:-3]
        DF.set_value(i, 'id', correctItem)
    i+=1

但这似乎不起作用......

任何人都可以指导我找到一个更优雅和文明的解决方案吗？是否将最后3个值转换为浮点数，这就是为什么它找不到连字符？

谢谢！

Answer 1

使用pd.Series.split

df['id'].str.split('-', 2).str[:2].str.join('-').to_frame()

        id
0  12-4253
1  24-3521
2   46-745
3  13-2131

Answer 2

你可以使用extract：

df = df['id'].str.extract('^([\d+]+-[\d+]+)', expand=False)
print (df)
0    12-4253
1    24-3521
2     46-745
3    13-2131
Name: id, dtype: object

根据是否存在破折号，有条件地在DataFrame中切割最后3个字符

问题描述投票：0回答：2

2个回答

最新问题

根据是否存在破折号，有条件地在DataFrame中切割最后3个字符

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2