我正在尝试编写一个 Bash 脚本来处理日志文件并根据给定条件提取特定值。日志位于提供的 URL 处并包含真实的 Web 服务器日志数据。日志中的每一行都以日期开头,如下所示:
Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
Safari/537.36"
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
这是我当前尝试过的脚本:
awk '/coderbyte heroku\/router/ {
match($0, /request_id=([^ ]+)/, req);
match($0, /fwd="([^"]+)"/, fwd);
req_value = req[1];
fwd_value = fwd[1];
if (fwd_value == "MASKED") {
print req_value " [M]";
} else {
print req_value " [" fwd_value "]";
}
}' web-logs-raw
但它一直说:
awk: line 2: syntax error at or near ,
awk: line 3: syntax error at or near ,
请问我错了什么?
令
web-logs-raw
内容为
Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
Safari/537.36"
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
然后
awk '/coderbyte heroku\/router/ {
match($0, /request_id=([^ ]+)/, req);
match($0, /fwd="([^"]+)"/, fwd);
req_value = req[1];
fwd_value = fwd[1];
if (fwd_value == "MASKED") {
print req_value " [M]";
} else {
print req_value " [" fwd_value "]";
}
}' web-logs-raw
提供输出
[]
[]
[]
[]
当使用 GNU Awk 5.3.1 时,所以你显然没有使用 GNU
AWK
,检查你是否有一个,如果有(或者你被允许安装 gawk
)你应该将 awk
更改为 gawk
解决语法错误。观察到还有另一个问题,request=
和fwd=
与heroku不在同一行。这在 GNU AWK
中很容易抵消,因为通过将 RS
设置为空字符串,即足以启用段落模式
gawk 'BEGIN{RS=""}/coderbyte heroku\/router/ {
match($0, /request_id=([^ ]+)/, req);
match($0, /fwd="([^"]+)"/, fwd);
req_value = req[1];
fwd_value = fwd[1];
if (fwd_value == "MASKED") {
print req_value " [M]";
} else {
print req_value " [" fwd_value "]";
}
}' web-logs-raw
将给出输出
b19a87a1-1bbb-46e7-b207-bd9f23d46afa [108.31.000.000]
910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" [108.31.000.000]
097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" [108.31.000.000]
d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" [108.31.000.000]
说明:当
RS
设置为空字符串时,GNU AWK
假定记录由空行分隔。