复制文本文件中的空条目,直到找到非空条目,然后继续使用新条目

问题描述 投票:0回答:1

我有一个非常大的文本文件,其中缺少一些条目。逻辑是持久的,因为每个“部分”的第一行都有正确的条目,该初始行之后的每一行都缺少这些条目。我正在尝试使用初始行中的信息更新错过这些条目的每一行,直到找到新的“初始信息行”。之后我将继续使用这些新发现的数据。

我在 sed 的帮助下在 bash 中构建了一个解决方案,但是这个过程非常非常慢,需要几个小时才能完成。我猜延迟的原因是我正在逐行读取,在 bash 中处理这些并将它们写入一个新文件。我的猜测是,带有变量和文件本身(-f)的 sed 脚本可以显着加快该过程。我不是 sed 这些高级用法的专家。我也愿意接受其他建议或工具 - 只要它们可以从 bash 脚本调用,因为这是自动化的一部分。

示例输入文件:

{"Initial line with more information like headers, unimportant, really only one line"
"Alpha","OldTheme","Some more text"
"","","Another rest text"
"","","Yet another text"
"Yadda","NewTheme","Crazy Text"
"","","More crazy text"

预期结果:

"Alpha","OldTheme","Some more text"
"Alpha","OldTheme","Another rest text"
"Alpha","OldTheme","Yet another text"
"Yadda","NewTheme","Crazy Text"
"Yadda","NewTheme","More crazy text"

这是我的工作(但非常慢)bash 脚本:

#!/bin/bash
first=0
cat inputfile | \
while read line; do
        if [ ${first} -eq 0 ]; then
                first=1; continue
        fi
        partline=$(echo "${line}" | grep -o '","\(.*\)')
        newinitial=$(echo "${line}" | sed 's/",".*//; s/^"//')
        if [ ! -z "${newinitial}" ]; then
                initial=${newinitial}
        fi
        newtheme=$(echo "${partline}" | sed 's/^","//; s/",".*//')
        if [ ! -z "${newtheme}" ]; then
                theme=${newtheme}
        fi
        restline=$(echo ${partline} | sed 's/^","//' | grep -o '","\(.*\)')
        echo "\"${initial}\",\"${theme}${restline}"
done >outputfile
bash csv sed text-files
1个回答
0
投票

Perl 单行代码(宽松地使用该术语)。需要

Text::CSV_XS
模块,可通过操作系统的包管理器(对于 OpenSUSE 和 RedHat 为
perl-Text-CSV_XS
,对于 Debian 系列为
libtext-csv-xs-perl
等)或您最喜欢的 CPAN 客户端进行安装。

% perl -MText::CSV_XS -e '
  print scalar <>; # Print header line
  my @saved;
  my $csv = Text::CSV_XS->new({binary => 1, always_quote => 1, empty_is_undef => 1});
  while (my $r = $csv->getline(STDIN)) {
    for my $i (0 .. $#$r) {
      if ($r->[$i]) {
        $saved[$i] = $r->[$i]
      } else {
        $r->[$i] = $saved[$i]
      }
    }
    $csv->say(STDOUT, $r)
  }' < input.csv
"Initial line with more information like headers, unimportant, really only one line"
"Alpha","OldTheme","Some more text"
"Alpha","OldTheme","Another rest text"
"Alpha","OldTheme","Yet another text"
"Yadda","NewTheme","Crazy Text"
"Yadda","NewTheme","More crazy text"

通过将每个非空字段保存在数组中,并在查看空字段时使用该保存的值来工作。

© www.soinside.com 2019 - 2024. All rights reserved.