散列列表的参数化排序

Question

我的目标是编写一个接受的子程序

一系列哈希
包含排序顺序的列表

要清楚 - 钥匙可能是任何东西。我的例子仅供参考。

给定一个包含所需排序顺序的键列表的数组

my @aSortOrder = ( 'DELTA1_2', 'SET1', 'SET2' );

我的想法是形成一个字符串

$a->{DELTA1_2} <=> $b->{DELTA1_2} or $a->{SET1} <=> $b->{SET1} or $a->{SET2} <=> $b->{SET2}

然后用eval执行它。

这是我的代码

my $paRecords = [
    { 'SET1' => 48265, 'DELTA1_2' => -1,  'SET2' => 48264 },
    { 'SET1' => 8328,  'DELTA1_2' => -29, 'SET2' => 8299 },
    { 'SET1' => 20,    'DELTA1_2' => 0,   'SET2' => 0 },
    { 'SET1' => 10,    'DELTA1_2' => 0,   'SET2' => 0 }
];
my @aSortOrder = ( 'DELTA1_2', 'SET1', 'SET2' );
my $pStr = '';

foreach ( @aSortOrder ) {
    $pStr = $pStr . ' or $a->{' . $_ . '} <=> $b->{' . $_ . '}';
}

$pStr =~ s/^\s*or\s*//;

my @aSorted = sort { eval "$pStr"; } @$paRecords;

print Dumper \@aSorted;

output

$VAR1 = [
          {
            'SET1' => 8328,
            'SET2' => 8299,
            'DELTA1_2' => -29
          },
          {
            'SET1' => 48265,
            'SET2' => 48264,
            'DELTA1_2' => -1
          },
          {
            'SET2' => 0,
            'DELTA1_2' => 0,
            'SET1' => 10
          },
          {
            'SET2' => 0,
            'DELTA1_2' => 0,
            'SET1' => 20
          }
        ];

我想这远不是解决问题的理想方法，所以关于如何更好地解决这个问题的任何指针都将是一个很大的帮助。

Answer 1

嗯，你是对的 - 使用eval就是通往未来痛苦的道路。

“排序”的乐趣在于你可以定义一个排序子程序，它隐含地定义了$a和$b，你可以使用你想要的任何逻辑来判断它是正，负或“零”比较（相等）。（例如像<=>或cmp那样）。

这里的诀窍是 - 'true'是非零的，所以<=>你可以测试'true'来看看是否有比较（4 <=> 4是'false'）

因此，如果您只是在数字上工作（您需要测试'字母数字'并在某些情况下使用cmp，但似乎不适用于您的数据）：

#!/usr/bin/env perl
use strict;
use warnings;

my $paRecords = [
   { 'SET1' => 48265, 'DELTA1_2' => -1,  'SET2' => 48264 },
   { 'SET1' => 8328,  'DELTA1_2' => -29, 'SET2' => 8299 },
   { 'SET1' => 20,    'DELTA1_2' => 0,   'SET2' => 0 },
   { 'SET1' => 10,    'DELTA1_2' => 0,   'SET2' => 0 }
];

#qw is 'quote-words' and just lets you space delimit terms. 
#it's semantically the same as ( 'DELTA1_2', 'SET1', 'SET2' );
my @order = qw ( DELTA1_2 SET1 SET2 );

#note - needs to come after definition of `@order` but it can be re-written later as long as it's in scope. 
#you can pass an order explicitly into the subroutine if you want though. 
sub order_by {
   for my $key (@order) {
      #compare key
      my $result = $a->{$key} <=> $b->{$key};
      #return it and exit the loop if they aren't equal, otherwise 
      #continue iterating sort terms. 
      return $result if $result;
   }
   return 0; #all keys were similar, therefore return zero.
}

print join (",", @order), "\n";
foreach my $record ( sort {order_by} @$paRecords ) {
   #use hash slice to order output in 'sort order'. 
   #optional, but hopefully clarifies what's going on. 
   print join (",", @{$record}{@order}), "\n";
}

这，给出您的数据输出：

DELTA1_2,SET1,SET2
-29,8328,8299
-1,48265,48264
0,10,0
0,20,0

注意，我选择使用hash slice作为输出，因为否则哈希是无序的，因此你的Dumper输出将是不一致的（随机排序的字段）。

如果您需要对订购更加动态，可以将其传递给sort-sub：

#!/usr/bin/env perl
use strict;
use warnings;

sub order_by {
   for my $key (@_) {
      #compare key
      my $result = $a->{$key} <=> $b->{$key};

      #return it and exit the loop if they aren't equal, otherwise
      #continue iterating sort terms.
      return $result if $result;
   }
   return 0;    #all keys were similar, therefore return zero.
}

my $paRecords = [
   { 'SET1' => 48265, 'DELTA1_2' => -1,  'SET2' => 48264 },
   { 'SET1' => 8328,  'DELTA1_2' => -29, 'SET2' => 8299 },
   { 'SET1' => 20,    'DELTA1_2' => 0,   'SET2' => 0 },
   { 'SET1' => 10,    'DELTA1_2' => 0,   'SET2' => 0 }
];

#qw is 'quote-words' and just lets you space delimit terms.
#it's semantically the same as ( 'DELTA1_2', 'SET1', 'SET2' );
my @order = qw ( DELTA1_2 SET1 SET2 );

print join( ",", @order ), "\n";
foreach my $record ( sort {order_by ( @order ) } @$paRecords ) {

   #use hash slice to order output in 'sort order'.
   #optional, but hopefully clarifies what's going on.
   print join( ",", @{$record}{@order} ), "\n";
}

Answer 2

只需创建一个进行比较的子。

sub custom_cmp {
   my $keys = shift;
   for my $key (@$keys) {
      my $cmp = $_[0]{$key} <=> $_[1]{$key};
      return $cmp if $cmp;
   }

   return 0;
}

my @aSorted = sort { custom_cmp(\@aSortOrder, $a, $b) } @$paRecords;

以上为每次比较进行了两次子调用。如果我们生成比较函数，我们可以将其减少为1。

sub make_custom_cmp {
   my @keys = @_;
   return sub($$) {
      for my $key (@keys) {
         my $cmp = $_[0]{$key} <=> $_[1]{$key};
         return $cmp if $cmp;
      }

      return 0;
   };
}

my $cmp = make_custom_cmp(@aSortOrder);

my @aSorted = sort $cmp @$paRecords;

我们可以更进一步，通过代码生成来平整循环。这就是“适当的”基于eval的解决方案的样子。但是，几乎不需要这种级别的优化。

sub make_custom_cmp {
   my @keys = @_;
   my @cmps;
   for $i (0..$#keys) {
      push @cmps, "\$_[0]{\$keys[$i]} <=> \$_[1]{\$keys[$i]}"
   }

   return eval("sub($$) { ".( join(" || ", @cmps) )."}");
}

my $cmp = make_custom_cmp(@aSortOrder);

my @aSorted = sort $cmp @$paRecords;

事实上，以下可能是性能最佳的解决方案：

my @aSorted =
   map $paRecords->[ unpack('N', substr($_, -4))-0x7FFFFFFF ],
      sort
         map pack('N*', map $_+0x7FFFFFFF, @{ $paRecords->[$_] }{@aSortOrder}, $_),
            0..$#$paRecords;

Answer 3

传递给sort的块可能包含任意数量的代码。根据$a是否应被视为小于，等于或大于$b，仅需要评估负数，零或正数。

我同意你决定把它捆绑到一个子程序中，所以我写了sort_hashes_by_keys，它要求对一个哈希数组的引用进行排序，并引用一个关键字符串数组。它返回根据键列表排序的哈希列表

use strict;
use warnings 'all';

use Data::Dump 'dd';

my $records =  [
    { SET1 => 48265, DELTA1_2 => -1,  SET2 => 48264 },
    { SET1 => 8328,  DELTA1_2 => -29, SET2 => 8299  },
    { SET1 => 20,    DELTA1_2 => 0,   SET2 => 0     },
    { SET1 => 10,    DELTA1_2 => 0,   SET2 => 0     }
];

my @sort_order = qw/ DELTA1_2 SET1 SET2 /;

my @sorted = sort_hashes_by_keys( $records, \@sort_order );

dd \@sorted;



sub sort_hashes_by_keys {
    my ( $hashes, $order ) = @_;

    sort {

        my $cmp = 0;

        for my $key ( @$order ) {
            last if $cmp = $a->{$key} <=> $b->{$key};
        }

        $cmp;

    } @$hashes;
}

output

[
  { DELTA1_2 => -29, SET1 => 8328, SET2 => 8299 },
  { DELTA1_2 => -1, SET1 => 48265, SET2 => 48264 },
  { DELTA1_2 => 0, SET1 => 10, SET2 => 0 },
  { DELTA1_2 => 0, SET1 => 20, SET2 => 0 },
]

请注意，在强调命名变量时，我强烈建议不要使用匈牙利表示法和驼峰表示法。 Perl不是严格类型的，它有像$，@和%这样的符号，它们表示每个变量的类型，因此匈牙利符号充其量是多余的，并且还会增加分散注意力和无关紧要的噪音。此外，按照惯例，大写字母是为模块名称和全局变量保留的，因此本地标识符应该是“蛇形”，即小写字母和下划线。许多非英语使用者也发现骆驼案难以解析

散列列表的参数化排序

问题描述投票：0回答：3

output

3个回答

output

最新问题

散列列表的参数化排序

问题描述 投票：0回答：3

output

3个回答

output

最新问题

问题描述投票：0回答：3