我是Perl的初学者,我必须做一些数据清理。只是为了快速了解我在做什么我有一个ID列表(Horse_ID),每个ID都是一匹马。马匹参加了几场比赛,这就是为什么他们的ID在多行上打印的原因(因此,每行与一个比赛匹配)。对于每场比赛,他们都与一名不同的教练竞争(或没有竞争)。
Original data to show exactly with what I am dealing with
我想在哈希中存储每匹马的不同教练的名称,以及每匹马具有的不同教练的数量。
查阅了有关堆栈溢出的一些信息后,我构建了一个代码。但是我的代码仅打印找到的培训师的名字和错误的培训师数量(0有时并不代表任何含义)。我找不到错误...这是我的代码:
if (($coach =~ /\w+/) && ($Horse_ID ne '')) {
if (($trainerhash -> {$Horse_ID} -> {trainerinfo}) && ($trainerhash -> {$Horse_ID} -> {trainerinfo} !~ /$Horse_ID/)) {
$trainerhash -> {$Horse_ID} -> {trainerinfo} .= "\t$coach";
my @coach = split (/\s/, $coach);
$numtrainers = $#coach+1;
}
elsif (!$trainerhash -> {$Horse_ID} -> {trainerinfo}) {
$trainerhash -> {$Horse_ID} -> {trainerinfo} = "$coach";
}
}
# Trainer hash - Number of trainers & trainer names
$trainerhash -> {$Horse_ID} -> {trainerinfo} = "$numtrainers\t$coach";
[如果有人有一个主意,那就太好了……我已经尝试过for循环,但是结果是相同的。
输入数据
Horse_ID name date localisation distance race_kategory rider rider_weight coach
1 Abakus 03/11/2018 Warszawa 1400 I V_Popov 58 S_Vasyutov 6
1 Abakus 09/09/2018 Warszawa 1800 I V_Popov 58 S_Vasyutov 5
1 Abakus 12/08/2018 Warszawa 1800 I A_Kabarov 58 S_Vasyutov x
1 Abakus 30/06/2018 Warszawa 1800 I V_Popov 58 S_Vasyutov 8
1 Abakus 09/06/2018 Warszawa 1600 II V_Popov 58 S_Vasyutov 1
2 Abbas 19/11/2017 Warszawa 2000 I S_Vasyutov 58 S_Vasyutov 3
2 Abbas 28/10/2017 Warszawa 1400 II P_Naoniechnyi 58 S_Vasyutov x
2 Abbas 08/10/2017 Warszawa 1400 II P_Naoniechnyi 58 S_Vasyutov x
提前感谢您的帮助,
散列非常适合查找重复项或分组项。我们将使用它们来对马进行分组并删除重复的教练。
培训师的数量只是培训师数组中元素的数量,因此我们不需要将其存储在任何地方。
use strict;
use warnings;
use feature qw( say );
my %trainers_by_horse;
if (defined( $_ = <DATA> )) {
my @headers = split;
while (<DATA>) {
my %fields;
@fields{@headers} = split;
my $horse_id = $fields{Horse_ID};
my $trainer = $fields{coach};
++$trainers_by_horse{$horse_id}{$trainer};
}
}
# Convert from
# $trainers_by_horse{$horse_id}{$trainer} = $num_rows;
# to
# $trainers_by_horse{$horse_id} = \@trainers;
for my $trainers (values(%trainers_by_horse)) {
$trainers = [ sort keys(%$trainers) ];
}
for my $horse_id (keys(%trainers_by_horse)) {
my $trainers = $trainers_by_horse{$horse_id};
my $num_trainers = @$trainers;
say(join("\t", $horse{id}, $num_trainers, join(",", $trainers)));
}
请查看代码段的方式
读取输入
拆分为字段
存储在
%race
哈希中与感兴趣的信息一起形成的
%trainer
哈希输出
%trainer
哈希数据(OP不提供所需的输出格式)
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $debug = 0;
my($header,@fields);
my %trainer;
$header = <DATA>;
chomp $header;
@fields = split ' ', $header;
while( <DATA> ) {
chomp;
next if /^$/; # skip empty lines
my %race;
@race{@fields} = split;
push @{ $trainer{$race{Horse_ID}}{trainerinfo} }, $race{coach}
if not grep { /$race{coach}/ } @{ $trainer{$race{Horse_ID}}{trainerinfo} };
}
say Dumper(\%trainer) if $debug;
say '
Horse Count Trainers
----------------------------';
for ( sort keys %trainer ) {
printf "%d\t%d\t%s\n",
$_,
scalar @{$trainer{$_}{trainerinfo}},
join(', ', @{$trainer{$_}{trainerinfo}});
}
__DATA__
Horse_ID name date localisation distance race_kategory rider rider_weight coach
1 Abakus 03/11/2018 Warszawa 1400 I V_Popov 58 S_Vasyutov 6
1 Abakus 09/09/2018 Warszawa 1800 I V_Popov 58 S_Vasyutov 5
1 Abakus 12/08/2018 Warszawa 1800 I A_Kabarov 58 S_Vasyutov x
1 Abakus 30/06/2018 Warszawa 1800 I V_Popov 58 S_Vasyutov 8
1 Abakus 09/06/2018 Warszawa 1600 II V_Popov 58 S_Vasyutov 1
2 Abbas 19/11/2017 Warszawa 2000 I S_Vasyutov 58 S_Vasyutov 3
2 Abbas 28/10/2017 Warszawa 1400 II P_Naoniechnyi 58 S_Vasyutov x
2 Abbas 08/10/2017 Warszawa 1400 II P_Naoniechnyi 58 S_Vasyutov x
2 Abbas 30/07/2017 Warszawa 1800 II P_Naoniechnyi 58 S_Vasyutov x
3 Abdank 19/05/2018 Warszawa 1600 II S_Vasyutov 58 S_Vasyutov 3
4 Adlina 07/09/2008 Wrocaw 1700 II D_Szope 56 J_Pochwatka 9
4 Adlina 07/09/2008 Wrocaw 1800 II D_Szope 58 S_Vasyutov 6
输出
Horse Count Trainers
----------------------------
1 1 S_Vasyutov
2 1 S_Vasyutov
3 1 S_Vasyutov
4 2 J_Pochwatka, S_Vasyutov
注意:
%horse
以保留一匹马的教练/教练的名单%trainer
,以保存教练训练有素的马匹的列表