如何从python的pandas数据框中的列中提取关键字（字符串）

Question

我有一个数据框df，看起来像这样：

         id                        Type                        agent_id  created_at
0       44525   Stunning 6 bedroom villa in New Delhi               184  2018-03-09
1       44859   Villa for sale in Amritsar                          182  2017-02-19
2       45465   House in Faridabad                                  154  2017-04-17
3       50685   5 Hectre land near New Delhi                        113  2017-09-01
4      130728   Duplex in Mumbai                                    157  2017-02-07
5      130856   Large plot with fantastic views in Mumbai           137  2018-01-16
6      130857   Modern Design Penthouse in Bangalore                199  2017-03-24

我有此表格数据，我正尝试通过从列中提取关键字来清理此数据，并因此使用新列创建一个新的数据框。

Apartment  = ['apartment', 'penthouse', 'duplex']
House      = ['house', 'villa', 'country estate']
Plot       = ['plot', 'land']
Location   = ['New Delhi','Mumbai','Bangalore','Amritsar']

因此所需的数据帧应如下所示：

         id      Type        Location    agent_id  created_at
0       44525   House       New Delhi         184  2018-03-09
1       44859   House        Amritsar         182  2017-02-19
2       45465   House       Faridabad         154  2017-04-17
3       50685   Plot        New Delhi         113  2017-09-01
4      130728   Apartment      Mumbai         157  2017-02-07
5      130856   Plot           Mumbai         137  2018-01-16
6      130857   Apartment   Bangalore         199  2017-03-24

所以到目前为止，我已经尝试过了：

import pandas as pd
df = pd.read_csv('test_data.csv')

#i can extract these keywords one by one by using for loops but how
#can i do this work in pandas with minimum possible line of code.

for index, values in df.type.iteritems():
  for i in Apartment:
     if i in values:
         print(i)

df_new = pd. Dataframe(df['id'])

有人可以告诉我如何解决吗？

Answer 1

首先用Location和str.extract为正则表达式str.extract创建|列：

OR

然后从另一个pat = '|'.join(r"\b{}\b".format(x) for x in Location) df['Location'] = df['Type'].str.extract('('+ pat + ')', expand=False)创建字典，将值与键交换，并使用list和参数str.contains通过掩码循环设置值：

str.contains

Answer 2

106如果isna（key）.any（）：-> 107提高ValueError（'无法使用包含'108个'NA / NaN值'）109 return False

ValueError：无法使用包含NA / NaN值的向量进行索引

我遇到了以上错误

如何从python的pandas数据框中的列中提取关键字（字符串）

问题描述投票：5回答：2

2个回答

最新问题

如何从python的pandas数据框中的列中提取关键字（字符串）

问题描述 投票：5回答：2

2个回答

最新问题

问题描述投票：5回答：2