我的代码可以正常工作,除了哈希。它在对文本文件进行哈希处理时效果很好,但是一旦遇到jpg或其他文件类型,它就会崩溃。我知道这是某种编码错误,但是我对如何正确编码非文本文件感到困惑。
#import libraries
import os
import time
from datetime import datetime
import logging
import hashlib
from prettytable import PrettyTable
from pathlib import Path
import glob
#user input
path = input ("Please enter directory: ")
print ("===============================================")
#processing input
if os.path.exists(path):
print("Processing directory: ", (path))
else:
print("Invalid directory.")
logging.basicConfig(filename="error.log", level=logging.ERROR)
logging.error(' The directory is not valid, please run the script again with the correct directory.')
print ("===============================================")
#process directory
directory = Path(path)
paths = []
filename = []
size = []
hashes = []
modified = []
files = list(directory.glob('**/*.*'))
for file in files:
paths.append(file.parents[0])
filename.append(file.parts[-1])
size.append(file.stat().st_size)
modified.append(datetime.fromtimestamp(file.stat().st_mtime))
with open(file) as f:
hashes.append(hashlib.md5(f.read().encode()).hexdigest())
#output in to tablecx
report = PrettyTable()
column_names = ['Path', 'File Name', 'File Size', 'Last Modified Time', 'MD5 Hash']
report.add_column(column_names[0], paths)
report.add_column(column_names[1], filename)
report.add_column(column_names[2], size)
report.add_column(column_names[3], modified)
report.add_column(column_names[4], hashes)
report.sortby = 'File Size'
print(report)
更改以下几行
with open(file) as f:
hashes.append(hashlib.md5(f.read().encode()).hexdigest())
到
with open(file, "rb") as f:
hashes.append(hashlib.md5(f.read()).hexdigest())
这样做,您将直接以字节为单位读取内容并计算散列。
您的版本尝试将文件读取为文本,然后将其重新编码为字节。以文本方式读取文件意味着,代码尝试使用系统的编码对其进行解码。对于某些字节组合,这将失败,因为它们不是给定编码的有效代码点。
所以只需将所有内容直接读取为字节。