根据这些列中的文本将 pandas 数据框列拆分为多个

问题描述 投票:0回答:1

我有一个带有列的 pandas 数据框。

id    text_col
1     Was it Accurate?: Yes\n\nReasoning: This is a sample text
2     Was it Accurate?: Yes\n\nReasoning: This is a sample text
3     Was it Accurate?: No\n\nReasoning: This is a sample text

我必须将 text_col 分成两列

"Was it accurate?"
"Reasoning"

最终的数据框应如下所示:

id    Was it Accurate?    Reasoning
1     Yes             This is a sample text
2     Yes             This is a sample text
3     No              This is a sample text

我尝试使用“拆分text_col” 推理:”但没有得到想要的结果。

df[['Was it Accurate?','Reasoning']] = df['text_col'].str.split("\n\nReasoning:")

python python-3.x pandas dataframe
1个回答
2
投票

假设有一个固定句子,您可以使用

str.extract
和占位符:

df[['Was it Accurate?', 'Reasoning']] = df['text_col'].str.extract(r'Was it Accurate\?: (Yes|No)\n\nReasoning: (.*)')

输出:

   id                                                   text_col Was it Accurate?              Reasoning
0   1  Was it Accurate?: Yes\n\nReasoning: This is a sample text              Yes  This is a sample text
1   2  Was it Accurate?: Yes\n\nReasoning: This is a sample text              Yes  This is a sample text
2   3   Was it Accurate?: No\n\nReasoning: This is a sample text               No  This is a sample text

如果您想要更通用的方法,您可以使用

str.extractall
pivot
:

out = (df['text_col'].str.extractall(r'([^\n:]+): *([^\n:]+)')
       .droplevel(1).pivot(columns=0, values=1)
      )

输出:

               Reasoning Was it Accurate?
0  This is a sample text              Yes
1  This is a sample text              Yes
2  This is a sample text               No
© www.soinside.com 2019 - 2024. All rights reserved.