我正在进行日志分析,我需要通过首先提取文件中的日期来分析日志文件。然后,我需要使用这些日期来定义开始日期和结束日期。根据选定的开始和结束日期,只有该范围内的特定内容才可用,从而有效地按日期过滤日志内容。
我已成功使用正则表达式格式提取日期,但根据开始和结束日期过滤日志内容的功能未按预期工作。
@staticmethod
def filter_log_entries(log_content, start_date, end_date):
start_datetime = datetime.strptime(start_date, '%d/%b/%Y').replace(tzinfo=timezone.utc)
end_datetime = datetime.strptime(end_date, '%d/%b/%Y').replace(tzinfo=timezone.utc)
# Adjust end_datetime to include the entire end day
end_datetime = end_datetime + timedelta(days=1) - timedelta(seconds=1)
log_entry_pattern = re.compile(r'\[(\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4})\]')
filtered_entries = []
for line in log_content.split('\n'):
match = log_entry_pattern.search(line)
if match:
entry_datetime_str = match.group(1)
try:
entry_datetime = datetime.strptime(entry_datetime_str, '%d/%b/%Y:%H:%M:%S %z')
if start_datetime <= entry_datetime <= end_datetime:
filtered_entries.append(line)
except ValueError:
st.write(f"Date parsing error for line: {line}")
filtered_log_content = "\n".join(filtered_entries)
return filtered_log_content
日志内容(显示):
日志文件中的日期格式为[17/May/2015:10:05:03 +0000],日志文件结束于[20/May/2015:10:05:03 +0000]。我想过滤日志内容,这样如果我选择日期范围从 17/May/2015 到 18/May/2015,则仅选择此时间线内的内容。
83.149.9.216 - - [17/May/2015:10:05:03 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:43 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-dashboard3.png HTTP/1.1" 200 171717 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:47 +0000] "GET /presentations/logstash-monitorama-2013/plugin/highlight/highlight.js HTTP/1.1" 200 26185 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:12 +0000] "GET /presentations/logstash-monitorama-2013/plugin/zoom-js/zoom.js HTTP/1.1" 200 7697 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:07 +0000] "GET /presentations/logstash-monitorama-2013/plugin/notes/notes.js HTTP/1.1" 200 2892 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:34 +0000] "GET /presentations/logstash-monitorama-2013/images/sad-medic.png HTTP/1.1" 200 430406 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:57 +0000] "GET /presentations/logstash-monitorama-2013/css/fonts/Roboto-Bold.ttf HTTP/1.1" 200 38720 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:50 +0000] "GET /presentations/logstash-monitorama-2013/css/fonts/Roboto-Regular.ttf HTTP/1.1" 200 41820 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:24 +0000] "GET /presentations/logstash-monitorama-2013/images/frontend-response-codes.png HTTP/1.1" 200 52878 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:50 +0000]
完整链接:https://github.com/linuxacademy/content-elastic-log-samples/blob/master/access.log
要获取包含输入日期的内容并排除结束日期,您可以使用以下代码:
import re
from datetime import datetime, timezone
@staticmethod
def test_rith_filter(rith_lg_cnt_test, ri_strt_dt, ri_ed_dt):
rith_st_dt = datetime.strptime(ri_strt_dt, '%d/%b/%Y').replace(tzinfo=timezone.utc)
rith_ed_dt = datetime.strptime(ri_ed_dt, '%d/%b/%Y').replace(tzinfo=timezone.utc)
ri_lg_pt = re.compile(r'\[(\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4})\]')
Tested_Result_Out = []
for ri in rith_lg_cnt_test.split('\n'):
cat = ri_lg_pt.search(ri)
if cat:
ri_entry_dt_str = cat.group(1)
try:
ri_entry_dt = datetime.strptime(ri_entry_dt_str, '%d/%b/%Y:%H:%M:%S %z')
if rith_st_dt <= ri_entry_dt < rith_ed_dt:
Tested_Result_Out.append(ri)
except ValueError:
print(f"There is an error in Parsing it Rithwik Bojja: {ri}")
ri_res = "\n".join(Tested_Result_Out)
return ri_res
rith_lg_cnt_test = """
83.149.9.216 - - [17/May/2015:10:05:03 +0000] "Test .1700.66 Safari/500.99"
83.149.9.216 - - [18/May/2015:12:05:03 +0000] " Rithwik afari/500.99"
83.149.9.216 - - [19/May/2015:14:05:03 +0000] "GET /presentations/logstash-monitorama Bojja .66 Safari/500.99"
"""
ri_strt_dt = "17/May/2015"
ri_ed_dt = "18/May/2015"
test_rith_call = test_rith_filter(rith_lg_cnt_test, ri_strt_dt, ri_ed_dt)
print(test_rith_call)
输出: