如何从以下数据框中的area
列中提取address
值?
address quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 5 45
请注意,它是㎡
或square metre
之前的values。
所需的输出将像这样:
address area quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 206.0 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 115.0 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 39.0 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 470.0 5 45
使用str.extract
Ex:
df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
输出:
address area
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska... 206
2 606-3727 Ullamcorper. Street Roseville NH 115㎡... 115
3 Ap #867-859 Sit Rd. Azusa New York 39 square m... 39
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492... 470