所以我有一些代码打开一个文本文件,其中包含文件路径列表,如下所示:
C:/用户/用户/桌面/ mini_mouse / 1980
C:/用户/用户/桌面/ mini_mouse / 1982
C:/用户/用户/桌面/ mini_mouse / 1984
然后,它逐行单独打开这些文件,并对文件进行一些过滤。然后我希望它将结果输出到一个完全不同的文件夹:
output_location = 'C:/Users/User/Desktop/test2/'
目前,我的代码当前将结果输出到打开原始文件的位置,即如果它打开文件C:/ Users / User / Desktop / mini_mouse / 1980,则输出将位于名称下的同一文件夹中1980_filtered”。但是,我希望输出进入output_location。谁能看到我目前在哪里出错?任何帮助将不胜感激!这是我的代码:
import os
def main():
stop_words_path = 'C:/Users/User/Desktop/NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:/Users/User/Desktop/test2/'
list_file = 'C:/Users/User/Desktop/list_of_files.txt'
with open(list_file, 'r') as f:
for file_name in f:
#print(file_name)
if file_name.endswith('\n'):
file_name = file_name[:-1]
#print(file_name)
file_path = os.path.join(file_name) # joins the new path of the file to the current file in order to access the file
filestring = '' # file string which will take all the lines in the file and add them to itself
with open(file_path, 'r') as f2: # open the file
print('just opened ' + file_name)
print('\n')
for line in f2: # read file line by line
x = remove_stop_words(line, stopwords) # remove stop words from line
filestring += x # add newly filtered line to the file string
filestring += '\n' # Create new line
new_file_path = os.path.join(output_location, file_name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(filestring)
if __name__ == "__main__":
main()
假设您正在使用Windows(因为您有一个普通的Windows文件系统),您必须在路径名中使用反斜杠。请注意,这仅适用于Windows。我知道这很烦人,所以我为你换了它(欢迎你:))。您还必须使用两个反斜杠,因为它会尝试将其用作转义字符。
import os
def main():
stop_words_path = 'C:\\Users\\User\\Desktop\\NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:\\Users\\User\\Desktop\\test2\\'
list_file = 'C:\\Users\\User\\Desktop\\list_of_files.txt'
with open(list_file, 'r') as f:
for file_name in f:
#print(file_name)
if file_name.endswith('\n'):
file_name = file_name[:-1]
#print(file_name)
file_path = os.path.join(file_name) # joins the new path of the file to the current file in order to access the file
filestring = '' # file string which will take all the lines in the file and add them to itself
with open(file_path, 'r') as f2: # open the file
print('just opened ' + file_name)
print('\n')
for line in f2: # read file line by line
x = remove_stop_words(line, stopwords) # remove stop words from line
filestring += x # add newly filtered line to the file string
filestring += '\n' # Create new line
new_file_path = os.path.join(output_location, file_name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(filestring)
if __name__ == "__main__":
main()
根据您的代码,它看起来像行中的问题:
new_file_path = os.path.join(output_location, file_name) + '_filtered'
在Python的os.path.join()中,输入中的任何绝对路径(或Windows中的驱动器号)都会丢弃它之前的所有内容,并从新的绝对路径(或驱动器号)重新启动连接。由于您直接从list_of_files.txt调用file_name并且每个路径都相对于C:驱动器格式化,因此每次调用os.path.join()都会丢弃output_location并重置为原始文件路径。
有关此行为的更好解释,请参阅Why doesn't os.path.join() work in this case?。
构建输出路径时,您可以从路径“C:/ Users / User / Desktop / mini_mouse / 1980”中删除文件名“1980”,并根据output_location变量和隔离文件名连接。