正则表达式变体

问题描述 投票:0回答:2

我想从熊猫数据框中提取一件衣服的长度。该数据框的行如下所示:

A-line dress with darting at front and back | Surplice neckline | Long sleeves | About 23" from shoulder to hem | Triacetate/polyester | Dry clean | Imported | Model shown is 5'10" (177cm) wearing a size 4

你可以看到大小包含在About和肩膀之间,但在某些情况下,肩膀被腰部,下摆等取代。下面是我的python脚本找到长度但是当我说切割时我们说About之后有一个逗号它失败了列表。

import re

def regexfinder(string_var):

    res=''

    x=re.search(r"(?<=About).*?(?=[shoulder,waist,hem,bust,neck,bust,top,hips])", string_var).group(0)
    tohave=int(x[1:3])

    if tohave >=16 and tohave<=36:
        res="Mini"
        return res

    if tohave>36 and tohave<40:
        res="Above the Knee"
        return res

    if tohave >=40 and tohave<=46:
        res="Knee length"
        return res

    if tohave>46 and tohave<49:
        res="Mid/Tea length"
        return res

    if tohave >=49 and tohave<=59:
        res="Long/Maxi length"
        return res

    if tohave>59:
        res="Floor Length"
        return res
python regex python-3.x
2个回答
1
投票

你的正则表达式(?<=About).*?(?=[shoulder,waist,hem,bust,neck,bust,top,hips])使用character class的话肩,腰,下摆,胸围,颈部,胸围,顶部,臀部。

我想你想使用或者|把它们放在一个非捕获组中。

使用可选的逗号,?尝试这样:

(?<=About),? (\d+)(?=.*?(?:shoulder|waist|hem|bust|neck|bust|top|hips]))

大小在第一个捕获组中。


1
投票
import re
s = """A-line dress with darting at front and back | Surplice neckline | Long sleeves | About 23" from shoulder to hem | Triacetate/polyester | Dry clean | Imported | Model shown is 5'10" (177cm) wearing a size 4"""
q = """'Velvet dress featuring mesh front, back and sleeves | Crewneck | Long bell sleeves | Self-tie closure at back cutout | About, 31" from shoulder to hem | Viscose/nylon | Hand wash | Imported | Model shown is 5\'10" (177cm) wearing a size Small.'1"""


def getSize(stringVal, strtoCheck): 
    for i in stringVal.split("|"):    #Split string by "|"
        if i.strip().startswith(strtoCheck):   #Check if string startswith "About"
            val =  i.strip()
            return re.findall("\d+", val)[0]    #Extract int

print getSize(s, "About")
print getSize(q, "About")

输出:

23
31
© www.soinside.com 2019 - 2024. All rights reserved.