我有一个带有列的 pandas 数据框。
id text_col
1 Was it Accurate?: Yes\n\nReasoning: This is a sample : text
2 Was it Accurate?: Yes\n\nReasoning: This is a :sample 2 text
3 Was it Accurate?: No\n\nReasoning: This is a sample: 1. text
我必须将 text_col 分成两列
"Was it accurate?"
和 "Reasoning"
最终的数据框应如下所示:
id Was it Accurate? Reasoning
1 Yes This is a sample : text
2 Yes This is a :sample 2 text
3 No This is a sample: 1. text
文本值可以有多个:“冒号”
我尝试使用“拆分text_col” 推理:”但没有得到想要的结果。它省略了第二个冒号(:)之后的文本
df[['Was it Accurate?', 'Reasoning']] = df['text_col'].str.extract(r'Was it Accurate\?: (Yes|No)\n\nReasoning: (.*)')
使用正则表达式提取列 您可以使用带有正则表达式的 str.extract 方法将 text_col 拆分为两列。正则表达式模式将捕获您感兴趣的文本部分。
蟒蛇
pattern = r'准确吗?:(是|否) 推理:(.*)'
df[['准确吗?', '推理']] = df['text_col'].str.extract(pattern)
df = df.drop(列=['text_col'])
打印(df)