将目录中的文件名与Pandas系列匹配,删除不匹配的文件

问题描述 投票:0回答:3

我在使用Python 2.7。

我在目录中有一堆文件(基本上是Outlook电子邮件)。示例文件名:

RE: We have Apple.msg
RE: Orange are in stock.msg
RE: Pick up some cabbage please.msg

我有一个熊猫系列

Granny Smith Apple
High Quality Orange
Delicious soup

如何遍历目录,查找包含pandas系列中单词的文件名,并删除找不到匹配项的文件?在上面的例子中,RE: Pick up some cabbage please.msg将被删除,因为在熊猫系列中发现了AppleOrange

编辑:我想实际删除目录中找不到匹配项的文件

python pandas
3个回答
1
投票

我们可以使用str.contains

s1[pd.Series(l).str.contains('|'.join(s.str.split().sum()))]
Out[560]: 
0          RE: We have Apple.msg
1    RE: Orange are in stock.msg
dtype: object

数据输入


l=['RE: We have Apple.msg',
'RE: Orange are in stock.msg',
'RE: Pick up some cabbage please.msg']
s1=pd.Series(l)
s=pd.Series(['Granny Smith Apple','High Quality Orange','Delicious soup'])

1
投票

可以使用oslistdir,然后使用str.contains

from os import listdir
from os.path import isfile, join
m = '/' # your path
files_in_directory = [f for f in listdir(m) if isfile(join(m, f))]
files = pd.Series(files_in_directory)

s = pd.Series(["Granny Smith Apple",
"High Quality Orange",
"Delicious soup"])

z = pd.Series(s.str.split().sum())
files.str.contains('|'.join(z))

0
投票

这是我发现适合我的解决方案

#contains strings we want to filter
checklist = [x.lower() for x in checklist]

m = r''  # path where our files are contained
new_directory = r'' # path where we will move the matched files to to


for each_checklist in checklist:
    print 'now checking for keyword ' + str(each_checklist)
    for root, dirs, files in os.walk(m):
        for i in files:
            if each_checklist in i.lower():
                # this moves the file from root, to target directory
                os.rename(os.path.join(root, i), os.path.join(new_directory, i))
            else:
                None
© www.soinside.com 2019 - 2024. All rights reserved.