我有一组日志文件,所有格式都基本上像这个例子(file1.text):
================================================
Running taskId=[updateFieldInTbl]
startTime: 16:03:34,580
------------------------------------------------
INFO:DBExecute: SQL=[ UPDATE tbl set field = value where thing > 0; ]
SQL: UPDATE tbl set field = value where thing > 0
Statement affected [746664] rows.
------------------------------------------------
Finished taskId=[updateFieldInTbl]
endTime: 16:06:30,571
elapsed: 00:02:55,991
failure: false
anyFailure: false
================================================
================================================
Running taskId=[calculateChecksum]
startTime: 16:06:30,571
------------------------------------------------
INFO:DBExecute: SQL=[ update tbl set checksum = MD5(CONCAT_WS('',field, field2, field3)); ]
SQL: update tbl set checksum = MD5(CONCAT_WS('',field, field2, field3));
Statement affected [9608630] rows.
================================================
===== Greater than 5 minutes Review! ==========
================================================
------------------------------------------------
Finished taskId=[calculateChecksum]
endTime: 16:44:04,473
elapsed: 00:37:33,901
failure: false
anyFailure: false
================================================
================================================
Running taskId=[deleteMatchingChecksum]
startTime: 16:44:04,473
------------------------------------------------
INFO:DBExecute: SQL=[ delete tbl from tbl inner join other on tbl.checksum = other.checksum; ]
SQL: delete tbl from tbl inner join other on tbl.checksum = other.checksum;
Statement affected [9276213] rows.
================================================
===== Greater than 5 minutes Review! ==========
================================================
------------------------------------------------
Finished taskId=[deleteMatchingChecksum]
endTime: 17:49:26,817
elapsed: 01:05:22,344
failure: false
anyFailure: false
================================================
================================================
Running taskId=[deletemissinguserDataChecksum]
startTime: 17:49:26,817
------------------------------------------------
INFO:DBExecute: SQL=[ delete from tbl where some_id =0; ]
SQL: delete from tbl where some_id =0;
Statement affected [0] rows.
------------------------------------------------
Finished taskId=[deletemissinguserDataChecksum]
endTime: 17:49:26,847
elapsed: 00:00:00,030
failure: false
anyFailure: false
================================================
我想将每个转换成如下所示:
file1 | taskId | startTime | endTime | elapsed | rowsAffected | Info | failure | anyFailure
file1 | updateFieldInTbl | 16:03:34 | 16:06:20 | 00:02:55 | 746664 | SQL=[ UPDATE tbl set field = value where thing > 0; ] | false | false
file1 | calculateChecksum | 16:06:30 | 16:44:04 | 00:37:33 | 9608630 | SQL=[ update tbl set checksum = MD5(CONCAT_WS('',field, field2, field3)); ] | false | false
file1 | deleteMatchingChecksum | 16:44:04 | 17:49:26 | 01:05:22 | 9276213 | SQL=[ delete tbl from tbl inner join other on tbl.checksum = other.checksum; ] | false | false
通常,我只是开始系统登录到数据库表,因此日志已经采用易于使用的格式,但目前这不是一个选项,所以我必须解析现有的日志进入类似有用的东西。
你会推荐什么工具?我认为目标是尽可能使用bash脚本构建一些东西。任何有关如何构建解析器的指导都将非常感激。
我建议Awk
处理:
awk 'NR==1{
fn=substr(FILENAME,0,length(FILENAME)-5);
print fn" | taskId | startTime | endTime | elapsed | rowsAffected | Info | failure | anyFailure"
}
/Running taskId/{ gsub(/^.+=\[|\]$/, ""); taskId=$0 }
/startTime:/{ sub(/,.*/,"",$2); startTime=$2 }
/INFO:/{ sub(/^INFO:DBExecute: /,""); info=$0 }
/ affected/{ gsub(/\[|\]/,"",$3); affected=$3 }
/endTime/{ sub(/,.*/,"",$2); endTime=$2 }
/elapsed/{ sub(/,.*/,"",$2); elapsed=$2 }
/^failure/{ fail=$2 }
/anyFailure/{
printf "%s | %s | %s | %s | %s | %d | %s | %s | %s\n",
fn, taskId, startTime, endTime, elapsed, affected, info, fail, $2
}' file1.text
输出:
file1 | taskId | startTime | endTime | elapsed | rowsAffected | Info | failure | anyFailure
file1 | updateFieldInTbl | 16:03:34 | 16:06:30 | 00:02:55 | 746664 | SQL=[ UPDATE tbl set field = value where thing > 0; ] | false | false
file1 | calculateChecksum | 16:06:30 | 16:44:04 | 00:37:33 | 9608630 | SQL=[ update tbl set checksum = MD5(CONCAT_WS('',field, field2, field3)); ] | false | false
file1 | deleteMatchingChecksum | 16:44:04 | 17:49:26 | 01:05:22 | 9276213 | SQL=[ delete tbl from tbl inner join other on tbl.checksum = other.checksum; ] | false | false
file1 | deletemissinguserDataChecksum | 17:49:26 | 17:49:26 | 00:00:00 | 0 | SQL=[ delete from tbl where some_id =0; ] | false | false
FWIW我尽量避免使用特定的字段名称,因为大多数输入行遵循相同的格式,所以不需要测试所有值,因此只需单独输出不遵循通用格式的几行:
$ cat tst.awk
BEGIN { OFS="," }
!NF || /^([^[:alpha:]]|SQL|Finished)/ { next }
{ tag = val = $0 }
/^Running/ {
prt()
gsub(/^[^ ]+ |=.*/,"",tag)
gsub(/.*\[|\].*/,"",val)
}
/^Statement/ {
tag = "rowsAffected"
gsub(/.*\[|\].*/,"",val)
}
/^[:[:alpha:]]+: / {
sub(/:.*/,"",tag)
sub(/^[:[:alpha:]]+: /,"",val)
}
{
tags[++numTags] = tag
tag2val[tag] = val
}
END { prt() }
function prt( tag,val,tagNr) {
if (numTags > 0) {
if ( ++recNr == 1 ) {
printf "\"%s\"%s", "file", OFS
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
printf "\"%s\"%s", tag, (tagNr<numTags ? OFS : ORS)
}
}
printf "\"%s\"%s", FILENAME, OFS
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
val = tag2val[tag]
gsub(/"/,"\"\"",val)
printf "\"%s\"%s", val, (tagNr<numTags ? OFS : ORS)
}
}
delete tags
delete tag2val
numTags = 0
}
我还将其输出为CSV,以便您可以将其读入Excel或使用它做任何您喜欢的事情:
$ awk -f tst.awk file1
"file","taskId","startTime","INFO","rowsAffected","endTime","elapsed","failure","anyFailure"
"file1","updateFieldInTbl","16:03:34,580","SQL=[ UPDATE tbl set field = value where thing > 0; ]","746664","16:06:30,571","00:02:55,991","false","false"
"file1","calculateChecksum","16:06:30,571","SQL=[ update tbl set checksum = MD5(CONCAT_WS('',field, field2, field3)); ]","9608630","16:44:04,473","00:37:33,901","false","false"
"file1","deleteMatchingChecksum","16:44:04,473","SQL=[ delete tbl from tbl inner join other on tbl.checksum = other.checksum; ]","9276213","17:49:26,817","01:05:22,344","false","false"
"file1","deletemissinguserDataChecksum","17:49:26,817","SQL=[ delete from tbl where some_id =0; ]","0","17:49:26,847","00:00:00,030","false","false"
如果您真的关心订单,您可以轻松调整它以通过其特定标签而不是数字顺序输出字段值。