为键值对编写 grok 模式

问题描述 投票:0回答:1
 "processors" : [
      {
        "grok": {
          "field": "log",
          "patterns": ["%{TIME_STAMP:ts} %{GREEDYDATA:logtail}"],
          "pattern_definitions" : {
             "TIME_STAMP" : "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}"
          },
          "ignore_failure" : true,
          "ignore_missing" : true
        }
      },
      {
        "kv" : {
          "field": "logtail",
          "field_split": "\\s(?![^=]+?(\\s|$))",
          "value_split": "=",
          "ignore_failure" : true
        }
      },
      {
        "remove" : {
          "field": "logtail",
          "ignore_failure" : true
        }
      },
      {
        "date" : {
          "field" : "ts",
          "formats" : ["yyyy-MM-dd HH:mm:ss,SSS"],
          "ignore_failure" : true
        }
      }
  ]

以上是我们的 grok 管道。

通常我们的原木都干净整洁

例如:

"2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/health/ user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"

工作完美,但如果日志中还有另一个=,一切都会崩溃!

2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/job?id=12345 user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"

这看起来一定是一个非常常见的用例,有现成的修复吗?

logstash-grok elk
1个回答
0
投票

也许从 fortinet 中分出领域是更好的做法。至少它适用于你的例子。

"field_split": " (?=[a-z\\_\\-]+=)"

参见:https://github.com/elastic/beats/blob/master/x-pack/filebeat/module/fortinet/firewall/ingest/pipeline.yml#L6-L17

© www.soinside.com 2019 - 2024. All rights reserved.