在列中搜索与另一列中的值匹配的子字符串

Question

假设我有一张看起来像这样的桌子：

价值	更长_值	第三值
子串	错误的单元格抱歉	错误_匹配
其他文字	子字符串_此处	match_this1
其他文字	第二个_这里	match_this2
第二个	子串_那里	match_this3

如果找到第一列中的值，我想用与 Longer_Value 列中的值相对应的 Third_Value 列的值填充“结果”列。如果找到多个，我想用一些分隔符列出它们。

所以茶几看起来像这样：

价值	更长_值	第三值	结果
子串	错误的单元格抱歉	错误_匹配	匹配_这个1，匹配_这个3
其他文字	子字符串_此处	match_this1	无匹配
其他文字	第二个_这里	match_this2	无匹配
第二个	子串_那里	match_this3	match_this2

我一直在尝试 .loc 和 str.contains 的各种组合（禁用正则表达式），但到目前为止没有任何运气。有什么想法如何解决这个问题吗？

Answer 1

以下内容可以解决您的问题吗？

# Function to find matches and update the Result column
def find_value(row, df):
    matches = df[df['Longer_Value'].str.contains(row['Value'], regex=False)]['Third_Value'].tolist()
    return ', '.join(matches) if matches else 'no_match'

# Apply the function to each row in the DataFrame
df['Result'] = df.apply(lambda row: find_value(row, df), axis=1)

Answer 2

没有高效的方法可以做到这一点，您可以循环唯一值并使用

str.contains

:

执行搜索

mapper = {v: ', '.join(df.loc[df['Longer_Value'].str.contains(v), 'Third_Value'])
          for v in df['Value'].unique()}

df['Result'] = df['Value'].map(mapper).replace('', 'no_match')

输出：

       Value      Longer_Value  Third_Value                    Result
0  substring  wrong_cell_sorry  wrong_match  match_this1, match_this3
1  othertext    substring_here  match_this1                  no_match
2  othertext    secondone_here  match_this2                  no_match
3  secondone   substring_there  match_this3               match_this2

在列中搜索与另一列中的值匹配的子字符串

问题描述投票：0回答：2

2个回答

最新问题

在列中搜索与另一列中的值匹配的子字符串

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2