我创建了一个记录器,将日志文件写入 Databricks 项目中的文件夹:
def configure_logger(logger, logfile, level=logging.DEBUG):
"""
Configures a logger with both file and stream handlers.
Parameters:
- logger (logging.Logger): The logger to configure.
- logfile (str): The path to the logfile.
- level (int, optional): The logging level. Defaults to logging.DEBUG.
Returns:
- tuple: A tuple containing the configured logger and the file handler.
"""
logger.setLevel(level)
# Create a file handler with detailed formatting for log output
file_handler = logging.FileHandler(logfile, mode="w")
fformatter = logging.Formatter('%(name)s - %(levelname)s: %(message)s')
file_handler.setFormatter(fformatter)
logger.addHandler(file_handler)
# Create a stream handler with simple formatting for cell output
stream_handler = logging.StreamHandler()
sformatter = logging.Formatter('%(levelname)s: %(message)s')
stream_handler.setFormatter(sformatter)
logger.addHandler(stream_handler)
# Remove random Pyspark logs
logging.getLogger("py4j").setLevel(logging.ERROR)
logging.getLogger("Comm").setLevel(logging.ERROR)
return logger, file_handler
# Set up root logger
logger = logging.getLogger()
logfile = '/Workspace/Project/Logs/dev/dev_20SEP2024.log' # example
# Ensure the directory exists
log_dir = os.path.dirname(logfile)
if not os.path.exists(log_dir):
os.makedirs(log_dir)
logger, file_handler = configure_logger(logger, logfile=logfile)
该代码大部分时间都有效,但每隔一段时间我就会收到以下错误:
FileNotFoundError: [Errno 2] No such file or directory: '/Workspace/Project/Logs/dev/dev_20SEP2024.log'
File <command-3660103723975015>, line 10
7 if not os.path.exists(log_dir):
8 os.makedirs(log_dir)
---> 10 logger, file_handler = configure_logger(logger, logfile=logfile)
但是文件夹退出了。
%sh
ls '/Workspace/Project/Logs/dev'
# returns dev_16SEP2024.log
即使该文件夹确实存在,并且前几天已登录其中。重新启动笔记本并等待一段时间后,它似乎再次工作或随机重新启动。不确定问题是什么,但似乎是 Databricks 或日志库中的问题。有人处理过这个吗?不管是什么,我需要弄清楚它为什么这样做,因为其他人可能会遇到这个问题。
如果我在不关闭处理程序的情况下重新运行代码,文件处理程序似乎会卡住或发生其他情况。我认为添加此后问题现在已解决:
def configure_logger(logger, logfile=None, level=logging.DEBUG):
logger.setLevel(level)
try:
# Create a file handler with detailed formatting for log output
file_handler = logging.FileHandler(logfile, mode="w")
except FileNotFoundError:
# For some reason, sometimes this error is triggered even though the file
# exists. Removing the file fixes this.
os.remove(logfile)
# Create a file handler with detailed formatting for log output
file_handler = logging.FileHandler(logfile, mode="w")
fformatter = logging.Formatter('%(name)s - %(levelname)s: %(message)s')
file_handler.setFormatter(fformatter)
logger.addHandler(file_handler)
# Create a stream handler with simple formatting for cell output
stream_handler = logging.StreamHandler()
sformatter = logging.Formatter('%(levelname)s: %(message)s')
stream_handler.setFormatter(sformatter)
logger.addHandler(stream_handler)
# Remove random Pyspark logs
logging.getLogger("py4j").setLevel(logging.ERROR)
logging.getLogger("Comm").setLevel(logging.ERROR)
return logger, file_handler