我有两个名为 A 和 B 的数据框。在数据框 A 中,我有一列名为 Comments 的列,在数据框 B 中,我有一列名为 Solution 的列。
下面是 df_A 和 df_B 中两列的数据
df_A = pd.DataFrame({'Comments':
"""
Repaired loose connection
No ice or water dispensing
No lights on the control panel"""},
index=[0])
df_B = pd.DataFrame({'Solution': ['A, B & C : Control panel not working: loose electrical connector',
'D,E & F: Not cooling : loose electrical connector']})
这里,需要做的是,我需要一个代码,它读取注释栏中的每个单词,并在解决方案栏中搜索该值,并根据 df_B 中的匹配解决方案填充 df_A 中的“答案”栏。
输出:
Comments:
Repaired loose connection no ice or
Water dispensing, no lights on the control panel
Answer:
A, B & C : control panel not working: loose electrical connector.
这就是我想要的输出。
下面的代码是我尝试过但没有得到任何结果。
for index, row in df_B.iterrows():
found=df_A[‘Comments’].str.contains (row[‘Solution’],case=False)
df_A.loc[found,’Answer’] =row[‘Solution’]
@Kinnuu,当我尝试了解你想要做什么时,我可以想出一个可能的解决方案:
import pandas as pd
import itertools
df_A = pd.DataFrame({'Comments':
"""Repaired loose connection
No ice or water dispensing
No lights on the control panel"""},
index=[0])
df_B = pd.DataFrame({'Solution': ['A, B & C : Control panel not working: loose electrical connector',
'D,E & F: Not cooling : loose electrical connector']})
# take all the unique words from the comments
words = set(itertools.chain.from_iterable(map(str.split,
df_A.loc[0, "Comments"].split("\n"))))
scores = []
# for each row keep track of the index and the total number of matching words
for index, row in df_B.iterrows():
# use split to make sure the match is on full words and lower to match on lowercase
words_in_row = list(map(str.lower,
row["Solution"].split(" ")))
scores.append((index,
len([word for word in words if word.lower() in words_in_row])))
# get the highest score by matched on the greatest length
high_score = max(scores, key=lambda x:x[-1])
# put the solution in as answer
df_A["Answer"] = df_B.iloc[high_score[0]].values
这现在仅适用于 df_A 中的一行。不过,将其变成一个函数应该不会太麻烦。