Perl-排序复杂数据结构

问题描述 投票:0回答:3

我有一个旧的perl项目,一个eventlog的文本解析器,并且收到了一个请求,要求按事件ID对输出进行排序并删除重复的事件。因此,解析器读取一个文本文件,并将每个事件放入一个数组中。数组中的每个字段都包含具有多个键->值对的哈希。一个键称为序列,它包含事件的编号。我现在想根据每个数组字段的序列值对数组进行排序。其次,我想从数组中删除重复的相同序列号。

这里有一些代码,我如何创建数组和哈希值,以便您了解数据结构:

open (my $mel, "<", $in_filename) or die "\nFile '$in_filename' does not exist or is not readable.\n";

my $i=0;
my $eventcount = 0;

while (<$mel>) {

        # Separate events by "Date/Time" :
        if (/^$/) {
            next;
        }
        if (/^Date\/Time:\s(.*)$/) {
            if ($eventcount >0) {
                $i++;
            }
            $eventcount++; # eventcount initialized with ‘0’
        }

        # Gathering information of the MEL event :
        if (/^Date\/Time:\s(.*)$/) {$MEL[$i]{date} = $1; next;}
        if (/^Sequence number:\s(\d+)$/) {$MEL[$i]{sequence} = $1; next;}
        if (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {$MEL[$i]{type} = lc $1; next;}
        if (/^Event category:\s(\w+)$/) {$MEL[$i]{category} = $1; next;}
        if (/^Priority:\s(\w+)/) {$MEL[$i]{priority} = $1; next;}
        if (/^Description:\s(.*)$/) {$MEL[$i]{description} = $1; next;}
        if (/^Event specific codes:\s(.*)$/) {$MEL[$i]{code} = $1; next;}
        if (/^Component location:\s(.*)$/) {$MEL[$i]{location} = $1; next;}
        if (/^Logged by:\s.*(.)$/) {$MEL[$i]{logged_by} = $1; next;}
        if (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {$MEL[$i]{version} = hex $1;}

}

文本文件中的事件示例:

Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200 <==============
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A

因此,基本上,我想根据哈希中键的值对包含对哈希的引用的数组进行排序。

第二,当键的值也存在于另一个数组字段中时,我想从数组中删除字段。

我希望有人能理解我的需求:-)

这可能吗?

arrays perl sorting hash unique
3个回答
1
投票
my @sorted = sort { $a->{sequence} <=> $b->{sequence} } @MEL;

但是使用哈希散列而不是哈希数组要容易得多。

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my $in_filename = ... ;
open my $mel, '<', $in_filename or die $!;

my %event;

my ($current, $id);
while (<$mel>) {

    next if /^$/;

    if (m{^Date/Time:\s(.*)$}) {
        if (defined $id) {
            $event{$id} = $current;
        }
        $current = { date => $1 };
    } elsif (/^Sequence number:\s(\d+)$/) {
        $id = $1;
    } elsif (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {
        $current->{type} = lc $1;
    } elsif (/^Event category:\s(\w+)$/) {
        $current->{category} = $1;
    } elsif (/^Priority:\s(\w+)/) {
        $current->{priority} = $1;
    } elsif (/^Description:\s(.*)$/) {
        $current->{description} = $1;
    } elsif (/^Event specific codes:\s(.*)$/) {
        $current->{code} = $1;
    } elsif (/^Component location:\s(.*)$/) {
        $current->{location} = $1;
    } elsif (/^Logged by:\s.*(.)$/) {
        $current->{logged_by} = $1;
    } elsif (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {
        $current->{version} = hex $1;
    }
}

for my $e (sort { $a <=> $b } keys %event) {
    say 'Sequence number:', $e;
    for my $k (sort keys %{ $event{$e} }) {
        say "$k: $event{$e}{$k}";
    }
}

可以通过构建一个大的正则表达式来匹配大多数细节来进一步简化:

my $regex = qr/
               Event\ type:\s(?<type>[0-9|a-f|A-F]{1,6})$
              |Event\ category:\s(?<category>\w+)$
              |Priority:\s(?<priority>\w+)
              |Description:\s(?<description>.*)$
              |Event\ specific\ codes:\s(?<code>.*)$
              |Component\ location:\s(?<location>.*)$
              |Logged\ by:\s.*(?<logged>.)$
              |4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
/x;

while (<$mel>) {
    next if /^$/;

    if (m{^Date/Time:\s(.*)$}) {
        if (defined $id) {
            $current->{type} = lc $current->{type}
                if exists $current->{type};
            $current->{version} = hex $current->{version}
                if exists $current->{version};
            $event{$id} = $current;
        }
        $current = { date => $1 };
    } elsif (/^Sequence number:\s(\d+)$/) {
        $id = $1;
    } elsif (/^$regex/) {
        $current->{ (keys %+)[0] } = (values %+)[0];
    } else {
        warn "Skipping: $_";
    }
}

0
投票
如果上述假设正确,那么任务就很简单。

将文件分割成记录,然后用

事件编号作为键填充哈希,并记录值作为跳过重复项的值。

然后对键哈希进行排序并输出记录。

use strict; use warnings; use feature 'say'; my %events; my %seen; my $data = do { local $/; <DATA> }; $data =~ s!\n(Date/Time)!\n\n$1!g; my @data = split '\n\n', $data; for my $record (@data) { my $event = get_event_n( $record ); next if $seen{$event}; $seen{$event} = 1; $events{$event} = $record; } say '----- Sorted Events -----'; for my $event (sort keys %events) { say $events{$event}; say '-' x 45; # record separator as visual indicator } sub get_event_n { my $record = shift; my $sequence; $record =~ /Sequence number:\s+(\d+)/; $sequence = $1; return $sequence; } __DATA__ Date/Time: 2/3/20, 12:19:20 PM Sequence number: 230 Event type: 5023 Event category: Command Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller return status/function call for requested operation Event specific codes: b8/1/0 Component type: Controller Component location: Shelf 99, Bay A Logged by: Controller in bay A Date/Time: 2/3/20, 12:18:20 PM Sequence number: 200 Event type: 5023 Event category: Command Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller return status/function call for requested operation Event specific codes: b8/1/0 Component type: Controller Component location: Shelf 99, Bay A Logged by: Controller in bay A Date/Time: 2/3/20, 12:18:25 PM Sequence number: 205 Event type: 5023 Event category: Command Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller return status/function call for requested operation Event specific codes: b8/1/0 Component type: Controller Component location: Shelf 99, Bay B Logged by: Controller in bay B Date/Time: 2/3/20, 12:18:28 PM Sequence number: 209 Event type: 5023 Event category: Command Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller return status/function call for requested operation Event specific codes: b8/1/0 Component type: Controller Component location: Shelf 92, Bay B Logged by: Controller in bay B Date/Time: 2/3/20, 12:18:25 PM Sequence number: 205 Event type: 5023 Event category: Command Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller return status/function call for requested operation Event specific codes: b8/1/0 Component type: Controller Component location: Shelf 99, Bay B Logged by: Controller in bay B

0
投票
my $key = 'sequence'; #or other fields my $keep = 'first'; #or 'last' record with identical $key my $regex = qr{ Date/Time: \s* (?<date>.*) |Sequence\ number: \s* (?<sequence>\d+) |Event\ type: \s* (?<type>[0-9|a-f|A-F]{1,6}) |Event\ category: \s* (?<category>\w+) |Priority: \s* (?<priority>\w+) |Description: \s* (?<description>.*) |Event\ specific\ codes: \s* (?<code>.*) |Component\ location: \s* (?<location>.*) |Logged\ by: \s* (?<logged_by>.*) |4[dD]\s45\s4[cC]\s48\s(?<version>\d\d) }x; my @event=(); while (<>) { m{^Date/Time:} and push @event, {}; m{^$regex} and @{$event[-1]}{keys %+} = values %+; } #special treatment for type and version: hex and lc exists $$_{type} and $$_{type} = hex $$_{type} for @event; exists $$_{version} and $$_{version} = lc $$_{version} for @event; #mark for deletion my %exists; $exists{$$_{$key}}++ and $$_{delete}=1 for $keep eq 'first' ? @event : $keep eq 'last' ? reverse(@event) : die "keep must be first or last"; #delete those marked @event = grep !$$_{delete}, @event; #sort by $key @event = sort { $$a{$key} <=> $$b{$key} } @event;

我猜想类型应该是hex ed,版本应该是lc ed,而不是像问题中的相反。

运行方式:

perl script.pl input_file

© www.soinside.com 2019 - 2024. All rights reserved.