在Python的数据帧变量中的两个字符串之间提取一些字符串

问题描述 投票:-1回答:1

我是Python新手,没有太多知识,需要解决我现在遇到的问题我有一个数据框,其数据变量为文本格式的“ item”,我需要在两个字符串(例如“ notify”和“ accordingly”)之间拉文本,我尝试了以下方法,但输出为空白

start = 'to notify'
end = 'accordingly'
data_1['match'] = data_1['Issue'].apply(lambda x: "".join(x for x in x.split() if re.search(('%s(.*)%s' % (start, end)),x)))

我也尝试过re.findall,但是它询问的是字符串或类似对象的字节,我试图隐蔽从对象到字符串的变量,但什至没有发生。如果有人可以帮助我解决这些问题,这将非常有帮助...

python string extract
1个回答
0
投票

我在阅读您的代码时遇到了一些问题,但是此代码段应能执行我的理解(在开始和结束字符串之间获取文本)

import pandas as pd
import re

start = 'to notify'
end = 'accordingly'

# I created an auxiliary function to better handle the errors
# when the patern start - text - end is not found
def extract_between(x, start, end):
    try:
        return re.match(pattern=r'.*{}(.*){}.*'.format(start, end), string=x).group(1)
    except AttributeError:
        return None

# This is just an example, if it does not work for your porpoise please share some data
df = pd.DataFrame([('to notify TEXT accordingly'), ('this should not match')], columns=['issue'])
df['issue'] = df['issue'].apply(extract_between, **{'start': start, 'end': end})

print(df['issue'])
© www.soinside.com 2019 - 2024. All rights reserved.