我在使用Python 2.7。
我在目录中有一堆文件(基本上是Outlook电子邮件)。示例文件名:
RE: We have Apple.msg
RE: Orange are in stock.msg
RE: Pick up some cabbage please.msg
我有一个熊猫系列
Granny Smith Apple
High Quality Orange
Delicious soup
如何遍历目录,查找包含pandas系列中单词的文件名,并删除找不到匹配项的文件?在上面的例子中,RE: Pick up some cabbage please.msg
将被删除,因为在熊猫系列中发现了Apple
和Orange
。
编辑:我想实际删除目录中找不到匹配项的文件
我们可以使用str.contains
s1[pd.Series(l).str.contains('|'.join(s.str.split().sum()))]
Out[560]:
0 RE: We have Apple.msg
1 RE: Orange are in stock.msg
dtype: object
数据输入
l=['RE: We have Apple.msg',
'RE: Orange are in stock.msg',
'RE: Pick up some cabbage please.msg']
s1=pd.Series(l)
s=pd.Series(['Granny Smith Apple','High Quality Orange','Delicious soup'])
可以使用os
和listdir
,然后使用str.contains
from os import listdir
from os.path import isfile, join
m = '/' # your path
files_in_directory = [f for f in listdir(m) if isfile(join(m, f))]
files = pd.Series(files_in_directory)
s = pd.Series(["Granny Smith Apple",
"High Quality Orange",
"Delicious soup"])
z = pd.Series(s.str.split().sum())
files.str.contains('|'.join(z))
这是我发现适合我的解决方案
#contains strings we want to filter
checklist = [x.lower() for x in checklist]
m = r'' # path where our files are contained
new_directory = r'' # path where we will move the matched files to to
for each_checklist in checklist:
print 'now checking for keyword ' + str(each_checklist)
for root, dirs, files in os.walk(m):
for i in files:
if each_checklist in i.lower():
# this moves the file from root, to target directory
os.rename(os.path.join(root, i), os.path.join(new_directory, i))
else:
None