Perl数据清理:如何做一个不重复名称的列表,并将所有这些小列表存储在哈希中?

问题描述 投票:0回答:2

我是Perl的初学者,我必须做一些数据清理。只是为了快速了解我在做什么我有一个ID列表(Horse_ID),每个ID都是一匹马。马匹参加了几场比赛,这就是为什么他们的ID在多行上打印的原因(因此,每行与一个比赛匹配)。对于每场比赛,他们都与一名不同的教练竞争(或没有竞争)。

Original data to show exactly with what I am dealing with

我想在哈希中存储每匹马的不同教练的名称,以及每匹马具有的不同教练的数量。

查阅了有关堆栈溢出的一些信息后,我构建了一个代码。但是我的代码仅打印找到的培训师的名字和错误的培训师数量(0有时并不代表任何含义)。我找不到错误...这是我的代码:

if (($coach =~ /\w+/) && ($Horse_ID ne '')) { 
    if (($trainerhash -> {$Horse_ID} -> {trainerinfo}) && ($trainerhash -> {$Horse_ID} -> {trainerinfo} !~ /$Horse_ID/)) {
        $trainerhash -> {$Horse_ID} -> {trainerinfo} .= "\t$coach"; 
        my @coach = split (/\s/, $coach);
         $numtrainers = $#coach+1;

    }
    elsif (!$trainerhash -> {$Horse_ID} -> {trainerinfo}) {
        $trainerhash -> {$Horse_ID} -> {trainerinfo} = "$coach";
    }

}

# Trainer hash - Number of trainers & trainer names
$trainerhash -> {$Horse_ID} -> {trainerinfo} = "$numtrainers\t$coach"; 

[如果有人有一个主意,那就太好了……我已经尝试过for循环,但是结果是相同的。

输入数据

Horse_ID    name    date    localisation    distance race_kategory  rider   rider_weight    coach
1   Abakus  03/11/2018  Warszawa    1400    I   V_Popov 58  S_Vasyutov  6
1   Abakus  09/09/2018  Warszawa    1800    I   V_Popov 58  S_Vasyutov  5
1   Abakus  12/08/2018  Warszawa    1800    I   A_Kabarov   58  S_Vasyutov  x
1   Abakus  30/06/2018  Warszawa    1800    I   V_Popov 58  S_Vasyutov  8
1   Abakus  09/06/2018  Warszawa    1600    II  V_Popov 58  S_Vasyutov  1
2   Abbas   19/11/2017  Warszawa    2000    I   S_Vasyutov  58  S_Vasyutov  3
2   Abbas   28/10/2017  Warszawa    1400    II  P_Naoniechnyi   58  S_Vasyutov x
2   Abbas   08/10/2017  Warszawa    1400    II  P_Naoniechnyi   58  S_Vasyutov x

提前感谢您的帮助,

arrays loops perl if-statement hash
2个回答
0
投票

散列非常适合查找重复项或分组项。我们将使用它们来对马进行分组并删除重复的教练。

培训师的数量只是培训师数组中元素的数量,因此我们不需要将其存储在任何地方。

use strict;
use warnings;
use feature qw( say );

my %trainers_by_horse;
if (defined( $_ = <DATA> )) {
   my @headers = split;
   while (<DATA>) {
      my %fields;
      @fields{@headers} = split;
      my $horse_id = $fields{Horse_ID};
      my $trainer  = $fields{coach};
      ++$trainers_by_horse{$horse_id}{$trainer};
   }
}

# Convert from
#    $trainers_by_horse{$horse_id}{$trainer} = $num_rows;
# to
#    $trainers_by_horse{$horse_id} = \@trainers;

for my $trainers (values(%trainers_by_horse)) {
   $trainers = [ sort keys(%$trainers) ];
}

for my $horse_id (keys(%trainers_by_horse)) {
   my $trainers     = $trainers_by_horse{$horse_id};
   my $num_trainers = @$trainers;

   say(join("\t", $horse{id}, $num_trainers, join(",", $trainers)));
}

-1
投票

请查看代码段的方式

读取输入

拆分为字段

存储在%race哈希中

与感兴趣的信息一起形成的%trainer哈希

输出%trainer哈希数据(OP不提供所需的输出格式)

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $debug = 0;

my($header,@fields);
my %trainer;

$header = <DATA>;
chomp $header;

@fields = split ' ', $header;

while( <DATA> ) {
    chomp;
    next if /^$/;           # skip empty lines
    my %race;
    @race{@fields} = split;
    push @{ $trainer{$race{Horse_ID}}{trainerinfo} }, $race{coach} 
        if not grep { /$race{coach}/ } @{ $trainer{$race{Horse_ID}}{trainerinfo} };
}

say Dumper(\%trainer) if $debug;

say '
Horse   Count   Trainers
----------------------------';
for ( sort keys %trainer ) {
    printf "%d\t%d\t%s\n",
            $_, 
            scalar @{$trainer{$_}{trainerinfo}},
            join(', ', @{$trainer{$_}{trainerinfo}});
}

__DATA__
Horse_ID    name    date    localisation    distance race_kategory  rider   rider_weight    coach
1   Abakus  03/11/2018  Warszawa    1400    I   V_Popov 58  S_Vasyutov  6
1   Abakus  09/09/2018  Warszawa    1800    I   V_Popov 58  S_Vasyutov  5
1   Abakus  12/08/2018  Warszawa    1800    I   A_Kabarov   58  S_Vasyutov  x
1   Abakus  30/06/2018  Warszawa    1800    I   V_Popov 58  S_Vasyutov  8
1   Abakus  09/06/2018  Warszawa    1600    II  V_Popov 58  S_Vasyutov  1
2   Abbas   19/11/2017  Warszawa    2000    I   S_Vasyutov  58  S_Vasyutov  3
2   Abbas   28/10/2017  Warszawa    1400    II  P_Naoniechnyi   58  S_Vasyutov x
2   Abbas   08/10/2017  Warszawa    1400    II  P_Naoniechnyi   58  S_Vasyutov x
2   Abbas   30/07/2017  Warszawa    1800    II  P_Naoniechnyi   58  S_Vasyutov x
3   Abdank  19/05/2018  Warszawa    1600    II  S_Vasyutov  58  S_Vasyutov  3
4   Adlina  07/09/2008  Wrocaw  1700    II  D_Szope 56  J_Pochwatka 9
4   Adlina  07/09/2008  Wrocaw  1800    II  D_Szope 58  S_Vasyutov  6

输出

Horse   Count   Trainers
----------------------------
1       1       S_Vasyutov
2       1       S_Vasyutov
3       1       S_Vasyutov
4       2       J_Pochwatka, S_Vasyutov

注意:

  • 考虑将哈希值命名为%horse以保留一匹马的教练/教练的名单
  • 考虑命名哈希%trainer,以保存教练训练有素的马匹的列表
© www.soinside.com 2019 - 2024. All rights reserved.