我有一些类似于下面的数据框的原始数据:
df = pd.DataFrame([{'var1': '220-224 (Even) roadname1', 'var2': 'location 1', 'var3': 'area 1'},
{'var1': 'site of 5 to 9 (odd) roadname2', 'var2': 'location 2', 'var3': 'area 2'},
{'var1': '16, 19 roadname3', 'var2': 'location 3', 'var3': 'area 3'}]
)
df
var1 var2 var3
0 220-224 (Even) roadname1 location 1 area 1
1 site of 5 to 9 (odd) roadname2 location 2 area 2
2 16, 19 roadname3 location 3 area 3
我想编写一个函数来分割 var1 字符串,以便每个指示的数字成为数据帧中的单独行,输出如下:
df = pd.DataFrame([{'var1': '220 roadname1', 'var2': 'location 1', 'var3': 'area 1'},
{'var1': '222 roadname1', 'var2': 'location 1', 'var3': 'area 1'},
{'var1': '224 roadname1', 'var2': 'location 1', 'var3': 'area 1'},
{'var1': '5 roadname2', 'var2': 'location 2', 'var3': 'area 2'},
{'var1': '7 roadname2', 'var2': 'location 2', 'var3': 'area 2'},
{'var1': '9 roadname2', 'var2': 'location 2', 'var3': 'area 2'},
{'var1': '16 roadname3', 'var2': 'location 3', 'var3': 'area 3'},
{'var1': '19 roadname3', 'var2': 'location 3', 'var3': 'area 3'},]
)
df
var1 var2 var3
0 220 roadname1 location 1 area 1
1 222 roadname1 location 1 area 1
2 224 roadname1 location 1 area 1
3 5 roadname2 location 2 area 2
4 7 roadname2 location 2 area 2
5 9 roadname2 location 2 area 2
6 16 roadname3 location 3 area 3
7 19 roadname3 location 3 area 3
字符串条件在大小写和数字范围方面有点可变,我不确定是否有一种有效的方法可以处理字符串变化。
使用自定义函数来分割范围(下面是使用正则表达式的示例),然后
explode
:
import re
def parse_range(s):
pat1 = r'^\D*(\d+)(?:-|\s+to\s+)(\d+)(?:\s*\((even|odd)\))?\s*(.*)$'
pat2 = r'^\D*([\d ,]+)\s*(.*)$'
m1 = re.search(pat1, s.lower())
if m1:
end = m1.group(4)
if m1.group(3):
return [f'{i} {end}' for i in
range(int(m1.group(1)), int(m1.group(2))+1, 2)]
else:
return [f'{i} {end}' for i in
range(int(m1.group(1)), int(m1.group(2))+1)]
m2 = re.search(pat2, s.lower())
if m2:
end = m2.group(2)
return [f'{i} {end}' for i in re.findall(r'\d+', m2.group(1))]
out = (df.assign(var1=df['var1'].map(parse_range))
.explode('var1')
)
输出:
var1 var2 var3
0 220 roadname1 location 1 area 1
0 222 roadname1 location 1 area 1
0 224 roadname1 location 1 area 1
1 5 roadname2 location 2 area 2
1 7 roadname2 location 2 area 2
1 9 roadname2 location 2 area 2
2 16 roadname3 location 3 area 3
2 19 roadname3 location 3 area 3