我有一个旧的perl项目,一个eventlog的文本解析器,并且收到了一个请求,要求按事件ID对输出进行排序并删除重复的事件。因此,解析器读取一个文本文件,并将每个事件放入一个数组中。数组中的每个字段都包含具有多个键->值对的哈希。一个键称为序列,它包含事件的编号。我现在想根据每个数组字段的序列值对数组进行排序。其次,我想从数组中删除重复的相同序列号。
这里有一些代码,我如何创建数组和哈希值,以便您了解数据结构:
open (my $mel, "<", $in_filename) or die "\nFile '$in_filename' does not exist or is not readable.\n";
my $i=0;
my $eventcount = 0;
while (<$mel>) {
# Separate events by "Date/Time" :
if (/^$/) {
next;
}
if (/^Date\/Time:\s(.*)$/) {
if ($eventcount >0) {
$i++;
}
$eventcount++; # eventcount initialized with ‘0’
}
# Gathering information of the MEL event :
if (/^Date\/Time:\s(.*)$/) {$MEL[$i]{date} = $1; next;}
if (/^Sequence number:\s(\d+)$/) {$MEL[$i]{sequence} = $1; next;}
if (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {$MEL[$i]{type} = lc $1; next;}
if (/^Event category:\s(\w+)$/) {$MEL[$i]{category} = $1; next;}
if (/^Priority:\s(\w+)/) {$MEL[$i]{priority} = $1; next;}
if (/^Description:\s(.*)$/) {$MEL[$i]{description} = $1; next;}
if (/^Event specific codes:\s(.*)$/) {$MEL[$i]{code} = $1; next;}
if (/^Component location:\s(.*)$/) {$MEL[$i]{location} = $1; next;}
if (/^Logged by:\s.*(.)$/) {$MEL[$i]{logged_by} = $1; next;}
if (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {$MEL[$i]{version} = hex $1;}
}
文本文件中的事件示例:
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200 <==============
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
因此,基本上,我想根据哈希中键的值对包含对哈希的引用的数组进行排序。
第二,当键的值也存在于另一个数组字段中时,我想从数组中删除字段。
我希望有人能理解我的需求:-)
这可能吗?
my @sorted = sort { $a->{sequence} <=> $b->{sequence} } @MEL;
但是使用哈希散列而不是哈希数组要容易得多。
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $in_filename = ... ;
open my $mel, '<', $in_filename or die $!;
my %event;
my ($current, $id);
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$event{$id} = $current;
}
$current = { date => $1 };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = $1;
} elsif (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {
$current->{type} = lc $1;
} elsif (/^Event category:\s(\w+)$/) {
$current->{category} = $1;
} elsif (/^Priority:\s(\w+)/) {
$current->{priority} = $1;
} elsif (/^Description:\s(.*)$/) {
$current->{description} = $1;
} elsif (/^Event specific codes:\s(.*)$/) {
$current->{code} = $1;
} elsif (/^Component location:\s(.*)$/) {
$current->{location} = $1;
} elsif (/^Logged by:\s.*(.)$/) {
$current->{logged_by} = $1;
} elsif (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {
$current->{version} = hex $1;
}
}
for my $e (sort { $a <=> $b } keys %event) {
say 'Sequence number:', $e;
for my $k (sort keys %{ $event{$e} }) {
say "$k: $event{$e}{$k}";
}
}
可以通过构建一个大的正则表达式来匹配大多数细节来进一步简化:
my $regex = qr/
Event\ type:\s(?<type>[0-9|a-f|A-F]{1,6})$
|Event\ category:\s(?<category>\w+)$
|Priority:\s(?<priority>\w+)
|Description:\s(?<description>.*)$
|Event\ specific\ codes:\s(?<code>.*)$
|Component\ location:\s(?<location>.*)$
|Logged\ by:\s.*(?<logged>.)$
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
/x;
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$current->{type} = lc $current->{type}
if exists $current->{type};
$current->{version} = hex $current->{version}
if exists $current->{version};
$event{$id} = $current;
}
$current = { date => $1 };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = $1;
} elsif (/^$regex/) {
$current->{ (keys %+)[0] } = (values %+)[0];
} else {
warn "Skipping: $_";
}
}
将文件分割成记录,然后用
事件编号作为键填充哈希,并记录值作为跳过重复项的值。
然后对键哈希进行排序并输出记录。
use strict;
use warnings;
use feature 'say';
my %events;
my %seen;
my $data = do { local $/; <DATA> };
$data =~ s!\n(Date/Time)!\n\n$1!g;
my @data = split '\n\n', $data;
for my $record (@data) {
my $event = get_event_n( $record );
next if $seen{$event};
$seen{$event} = 1;
$events{$event} = $record;
}
say '----- Sorted Events -----';
for my $event (sort keys %events) {
say $events{$event};
say '-' x 45; # record separator as visual indicator
}
sub get_event_n {
my $record = shift;
my $sequence;
$record =~ /Sequence number:\s+(\d+)/;
$sequence = $1;
return $sequence;
}
__DATA__
Date/Time: 2/3/20, 12:19:20 PM
Sequence number: 230
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:28 PM
Sequence number: 209
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 92, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
my $key = 'sequence'; #or other fields
my $keep = 'first'; #or 'last' record with identical $key
my $regex = qr{
Date/Time: \s* (?<date>.*)
|Sequence\ number: \s* (?<sequence>\d+)
|Event\ type: \s* (?<type>[0-9|a-f|A-F]{1,6})
|Event\ category: \s* (?<category>\w+)
|Priority: \s* (?<priority>\w+)
|Description: \s* (?<description>.*)
|Event\ specific\ codes: \s* (?<code>.*)
|Component\ location: \s* (?<location>.*)
|Logged\ by: \s* (?<logged_by>.*)
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
}x;
my @event=();
while (<>) {
m{^Date/Time:} and push @event, {};
m{^$regex} and @{$event[-1]}{keys %+} = values %+;
}
#special treatment for type and version: hex and lc
exists $$_{type} and $$_{type} = hex $$_{type} for @event;
exists $$_{version} and $$_{version} = lc $$_{version} for @event;
#mark for deletion
my %exists; $exists{$$_{$key}}++ and $$_{delete}=1
for $keep eq 'first' ? @event
: $keep eq 'last' ? reverse(@event)
: die "keep must be first or last";
#delete those marked
@event = grep !$$_{delete}, @event;
#sort by $key
@event = sort { $$a{$key} <=> $$b{$key} } @event;
我猜想类型应该是hex
ed,版本应该是lc
ed,而不是像问题中的相反。
运行方式:
perl script.pl input_file