"processors" : [
{
"grok": {
"field": "log",
"patterns": ["%{TIME_STAMP:ts} %{GREEDYDATA:logtail}"],
"pattern_definitions" : {
"TIME_STAMP" : "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}"
},
"ignore_failure" : true,
"ignore_missing" : true
}
},
{
"kv" : {
"field": "logtail",
"field_split": "\\s(?![^=]+?(\\s|$))",
"value_split": "=",
"ignore_failure" : true
}
},
{
"remove" : {
"field": "logtail",
"ignore_failure" : true
}
},
{
"date" : {
"field" : "ts",
"formats" : ["yyyy-MM-dd HH:mm:ss,SSS"],
"ignore_failure" : true
}
}
]
以上是我们的 grok 管道。
通常我们的原木都干净整洁
例如:
"2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/health/ user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"
工作完美,但如果日志中还有另一个=,一切都会崩溃!
2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/job?id=12345 user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"
这看起来一定是一个非常常见的用例,有现成的修复吗?
也许从 fortinet 中分出领域是更好的做法。至少它适用于你的例子。
"field_split": " (?=[a-z\\_\\-]+=)"