我是python的新手,这是我关于堆栈溢出的第一篇文章。我有一个关键字列表和一个包含多列的数据框。
我想在特定的列中搜索这些关键字,并针对该关键字写出出现的关键字。
这就是我正在做的。 My code
这是我得到的错误。 The loop with the error
这就是我想要的。 Desired output
[请帮助找出问题所在或提出解决方案。谢谢!如果可以帮助使事情变得简单,请编写下面的代码。
import pandas as pd
keywords = ["hello","hi","greetings","wassup"]
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im
Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['Keywords'] = ""
df2 = pd.DataFrame(data = None, columns = df.columns)
for word in keywords:
temp = df[df['strings'].str.contains(word,na = False)]
temp.reset_index(drop = True)
temp['Keywords'] = word
df2.append(temp)
错误:
C:\ Users \ harka \ Anaconda3 \ lib \ site-packages \ ipykernel_launcher.py:5:SettingWithCopyWarning:试图在DataFrame的切片副本上设置一个值。尝试使用.loc [row_indexer,col_indexer] =值
请参见文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy“”“
我添加了'Yo'以表明它可以返回多个字符串
import pandas as pd
def keyword(row):
strings = row['strings']
keywords = ["hello","hi","greetings","wassup",'yo']
keyword = [key for key in keywords if key.upper() in strings.upper()]
return keyword
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['keyword'] = df.apply(keyword, axis=1)
如果您不喜欢返回字符串列表,那么也许用逗号分隔的字符串?
import pandas as pd
def keyword(row):
strings = row['strings']
keywords = ["hello","hi","greetings","wassup",'yo']
keyword = [key for key in keywords if key.upper() in strings.upper()]
return ','.join(keyword)
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['keyword'] = df.apply(keyword, axis=1)