我有一个带有列的 pandas 数据框。
id text_col
1 Was it Accurate?: Yes\n\nReasoning: This is a sample text
2 Was it Accurate?: Yes\n\nReasoning: This is a sample text
3 Was it Accurate?: No\n\nReasoning: This is a sample text
我必须将 text_col 分成两列
"Was it accurate?"
和 "Reasoning"
最终的数据框应如下所示:
id Was it Accurate? Reasoning
1 Yes This is a sample text
2 Yes This is a sample text
3 No This is a sample text
我尝试使用“拆分text_col” 推理:”但没有得到想要的结果。
df[['Was it Accurate?','Reasoning']] = df['text_col'].str.split("\n\nReasoning:")
str.extract
和占位符:
df[['Was it Accurate?', 'Reasoning']] = df['text_col'].str.extract(r'Was it Accurate\?: (Yes|No)\n\nReasoning: (.*)')
输出:
id text_col Was it Accurate? Reasoning
0 1 Was it Accurate?: Yes\n\nReasoning: This is a sample text Yes This is a sample text
1 2 Was it Accurate?: Yes\n\nReasoning: This is a sample text Yes This is a sample text
2 3 Was it Accurate?: No\n\nReasoning: This is a sample text No This is a sample text
str.extractall
和 pivot
:
out = (df['text_col'].str.extractall(r'([^\n:]+): *([^\n:]+)')
.droplevel(1).pivot(columns=0, values=1)
)
输出:
Reasoning Was it Accurate?
0 This is a sample text Yes
1 This is a sample text Yes
2 This is a sample text No