我正在尝试从IBM AFP格式文件中解压缩TLE(带标记的逻辑元素)。
规范(http://www.afpcinc.org/wp-content/uploads/2017/12/MODCA-Reference-09.pdf)表示这是两个三元组(即使有四个值),其结构如下(它们的字节偏移量):
0:长度| 1:Tid | 2-n:参数(= 2:类型+ 3:格式+ 4-n:EBCDIC编码的字符串)
示例(有两个三元组,一个表示名称,一个表示值):
0C 02 0B 00 C3 A4 99 99 85 95 83 A8 07 36 00 00 C5 E4 D9
12 KEY UID CHAR C u r r e n c y 7 VAL RESERVED E U R
我使用Perl对其进行了如下解析(并成功解析):
if ($key eq 'Data') {
my $tle = $member->{struct}->{$key};
my $k_length = hex(unpack('H2', substr($tle, 0, 1)));
my $key = decode('cp500', substr($tle, 4, $k_length - 4));
my $v_length = hex(unpack('H2', substr($tle, $k_length, 1)));
my $value = decode('cp500', substr($tle, $k_length + 4, $v_length - 4));
print("'$key' => '$value'\n");
}
结果:
'货币'=>'欧元'
尽管上述成功,但我觉得我的方法有点太复杂了,并且有一种更有效的方法可以做到这一点。例如。 pack
模板是否支持读取前n个字节以用作要解包多少个连续字节的量词?我阅读了Perl Pack教程,但似乎找不到类似的东西。
如果长度字段不包含自身,则可以执行以下操作:
(my $record, $unparsed) = unpack("C/a a*", $unparsed);
my $key = decode("cp500", unpack("x3 a*", $record));
但是长度字段包括其自身。
(my $length, $unparsed) = unpack("C a*", $unparsed);
(my $record, $unparsed) = unpack("a".($length-1)." a*", $unparsed);
my $key = decode("cp500", unpack("x3 a*", $record));
请查看以下演示代码是否满足您的要求。
此代码
定义散列解码子例程
读取DATA块中OP提供的字节的十六进制表示形式>>
使用pack将读取的数据转换为二进制表示形式的[[$ data
通过使用unpack]提取长度
和密钥/密码] >>为此特定key
]的子程序调用[[decoder子例程获取由两个数组keys和vals
组成的散列通过提供的键和vals
形成新的哈希%data输出键和值(返回的[[键用于保留字节/字段顺序)]注意:Encode 'from_to'用于解码EBCDIC
-alternativeuse strict; use warnings; use feature 'say'; use utf8; use Encode 'from_to'; my $debug = 1; my %decoder = ( 1 => \&decode_type1, 2 => \&decode_currency, 3 => \&decode_type3, 4 => \&decode_type4, 5 => \&decode_type5 ); my $bytes = read_bytes(); my($len,$key) = unpack('C2',$bytes); my $data = $decoder{$key}($bytes); my %data; @data{@{$data->{keys}}} = @{$data->{vals}}; say ' Unpacked data ---------------'; printf "%-8s => %s\n", $_, $data{$_} for @{$data->{keys}}; sub read_bytes { my $hex_bytes = <DATA>; chomp $hex_bytes; my $bytes = pack('H*',$hex_bytes); return $bytes; } sub show_bytes { my $data = shift; print "Bytes: "; printf "%02X ", $_ for unpack 'C*', $data; print "\n"; } sub decode_type1 { my $bytes = shift; return { keys => 'type1', vals => 'vals1' }; } sub decode_currency { my $bytes = shift; show_bytes($bytes) if $debug; my @keys = qw/length_1 key uid char data_1 length_2 val reserved data_2/; my @vals = unpack('C4A8C2SA3',$bytes); from_to($vals[4], 'cp37', 'latin1'); from_to($vals[8], 'cp37', 'latin1'); return { keys => \@keys, vals => \@vals}; } sub decode_type3 { my $bytes = shift; return { keys => 'type3', vals => 'vals3' }; } sub decode_type4 { my $bytes = shift; return { keys => 'type4', vals => 'vals4' }; } sub decode_type5 { my $bytes = shift; return { keys => 'type5', vals => 'vals5' }; } __DATA__ 0C020B00C3A49999859583A807360000C5E4D9
输出
Bytes: 0C 02 0B 00 C3 A4 99 99 85 95 83 A8 07 36 00 00 C5 E4 D9 Unpacked data --------------- length_1 => 12 key => 2 uid => 11 char => 0 data_1 => Currency length_2 => 7 val => 54 reserved => 0 data_2 => EUR
注意:[
仅占给定欧元数的0..255范围的字节占据一个字节。也许 reserved个字节可能是val
欧元数量的一部分。