awk 脚本解析日志并提取特定值

问题描述 投票:0回答:1

我正在尝试编写一个 Bash 脚本来处理日志文件并根据给定条件提取特定值。日志位于提供的 URL 处并包含真实的 Web 服务器日志数据。日志中的每一行都以日期开头,如下所示:

Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
Safari/537.36"

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https

这是我当前尝试过的脚本:

awk '/coderbyte heroku\/router/ {
    match($0, /request_id=([^ ]+)/, req);
    match($0, /fwd="([^"]+)"/, fwd);
    req_value = req[1];
    fwd_value = fwd[1];

    if (fwd_value == "MASKED") {
        print req_value " [M]";
    } else {
        print req_value " [" fwd_value "]";
    }
}' web-logs-raw

但它一直说:

awk: line 2: syntax error at or near ,
awk: line 3: syntax error at or near ,

请问我错了什么?

unix awk
1个回答
0
投票

web-logs-raw
内容为

Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
Safari/537.36"

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https

然后

awk '/coderbyte heroku\/router/ {
    match($0, /request_id=([^ ]+)/, req);
    match($0, /fwd="([^"]+)"/, fwd);
    req_value = req[1];
    fwd_value = fwd[1];

    if (fwd_value == "MASKED") {
        print req_value " [M]";
    } else {
        print req_value " [" fwd_value "]";
    }
}' web-logs-raw

提供输出

 []
 []
 []
 []

当使用 GNU Awk 5.3.1 时,所以你显然没有使用 GNU

AWK
,检查你是否有一个,如果有(或者你被允许安装
gawk
)你应该将
awk
更改为
gawk 
解决语法错误。观察到还有另一个问题,
request=
fwd=
与heroku不在同一行。这在 GNU
AWK
中很容易抵消,因为通过将
RS
设置为空字符串,即足以启用段落模式

gawk 'BEGIN{RS=""}/coderbyte heroku\/router/ {
    match($0, /request_id=([^ ]+)/, req);
    match($0, /fwd="([^"]+)"/, fwd);
    req_value = req[1];
    fwd_value = fwd[1];

    if (fwd_value == "MASKED") {
        print req_value " [M]";
    } else {
        print req_value " [" fwd_value "]";
    }
}' web-logs-raw

将给出输出

b19a87a1-1bbb-46e7-b207-bd9f23d46afa [108.31.000.000]
910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" [108.31.000.000]
097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" [108.31.000.000]
d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" [108.31.000.000]

说明:当

RS
设置为空字符串时,GNU
AWK
假定记录由空行分隔。

© www.soinside.com 2019 - 2024. All rights reserved.