bash文本嵌套多个条件的解析

问题描述 投票:1回答:1

我有以下代码,用于检查10个单词以上的行,并将其拆分为出现第一个逗号的位置。它重申了该过程,因此,所有新拆分的超过10个单词和逗号的行也将被拆分(最后没有超过10个单词和逗号的行)。

我如何编辑此代码以执行以下操作:在完成所有逗号分割(当前代码已执行的操作)之后,检查结果行是否包含10个以上的单词,并在第一个“和”(空间)出现?

#!/usr/bin/env bash

input=input.txt
temp=$(mktemp ${input}.XXXX)
trap "rm -f $temp" 0

while awk '
  BEGIN { retval=1 }
  NF >= 10 && /, / {
    sub(/, /, ","ORS)
    retval=0
  }
  1
  END { exit retval }
' "$input" > "$temp"; do
  mv -v $temp $input
done

输入样本:

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10 Word11

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10, Word11 Word12 Word13 Word14 Word15 Word16 

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10 Word11 and Word12 Word13 Word14 Word15 

Word1 Word2 Word3 Word4 and Word5

所需的输出:

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9

Word1 Word2 Word3 Word4, 
Word5 Word6 Word7 Word8 Word9 Word10 Word11

Word1 Word2 Word3 Word4,
 Word5 Word6 Word7 Word8 Word9 Word10,
 Word11 Word12 Word13 Word14 Word15 Word16 

Word1 Word2 Word3 Word4, 
Word5 Word6 Word7 Word8 Word9 Word10 Word11
 and Word12 Word13 Word14 Word15 

Word1 Word2 Word3 Word4 and Word5

谢谢您!

bash parsing text nested multiple-conditions
1个回答
1
投票

这是您的预期答案吗?

echo "Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10, Word11 Word12 Word13 Word14 Word15 Word16 Word17 Word18 Word19 Word20 Word21 and Word22 Word23 Word24." | grep -oE '[a-zA-Z0-9,.]+' | awk '
BEGIN {
    cnt = 0
}
{
    str = str " " $0
    if ($0 ~ /,$/){
        print str
        cnt = 0
        str = ""
    }
    else if (cnt < 10){
        cnt++
    }
    else {
        print str
        cnt = 0
        str = ""
    }
} END {
    print str
}' | sed 's/^ *//'
Word1 Word2 Word3 Word4,
Word5 Word6 Word7 Word8 Word9 Word10,
Word11 Word12 Word13 Word14 Word15 Word16 Word17 Word18 Word19 Word20 Word21
and Word22 Word23 Word24.
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.