根据文本值将 pandas 数据框列拆分为多个

问题描述 投票:0回答:1

我有一个带有列的 pandas 数据框。

id    text_col
1     Was it Accurate?: Yes\n\nReasoning: This is a sample : text
2     Was it Accurate?: Yes\n\nReasoning: This is a :sample 2 text
3     Was it Accurate?: No\n\nReasoning: This is a sample: 1. text

我必须将 text_col 分成两列

"Was it accurate?"
"Reasoning"

最终的数据框应如下所示:

id    Was it Accurate?    Reasoning
1     Yes             This is a sample : text
2     Yes             This is a :sample 2 text
3     No              This is a sample: 1. text

文本值可以有多个:“冒号”

我尝试使用“拆分text_col” 推理:”但没有得到想要的结果。它省略了第二个冒号(:)之后的文本

df[['Was it Accurate?', 'Reasoning']] = df['text_col'].str.extract(r'Was it Accurate\?: (Yes|No)\n\nReasoning: (.*)')

python python-3.x pandas dataframe
1个回答
0
投票

使用正则表达式提取列 您可以使用带有正则表达式的 str.extract 方法将 text_col 拆分为两列。正则表达式模式将捕获您感兴趣的文本部分。

蟒蛇

定义正则表达式模式

pattern = r'准确吗?:(是|否) 推理:(.*)'

提取列

df[['准确吗?', '推理']] = df['text_col'].str.extract(pattern)

如果不再需要,请删除原始的“text_col”

df = df.drop(列=['text_col'])

打印(df)

© www.soinside.com 2019 - 2024. All rights reserved.