我有以下代码,用于检查10个单词以上的行,并将其拆分为出现第一个逗号的位置。它重申了该过程,因此,所有新拆分的超过10个单词和逗号的行也将被拆分(最后没有超过10个单词和逗号的行)。
我如何编辑此代码以执行以下操作:在完成所有逗号分割(当前代码已执行的操作)之后,检查结果行是否包含10个以上的单词,并在第一个“和”(空间)出现?
#!/usr/bin/env bash
input=input.txt
temp=$(mktemp ${input}.XXXX)
trap "rm -f $temp" 0
while awk '
BEGIN { retval=1 }
NF >= 10 && /, / {
sub(/, /, ","ORS)
retval=0
}
1
END { exit retval }
' "$input" > "$temp"; do
mv -v $temp $input
done
输入样本:
Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9
Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10 Word11
Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10, Word11 Word12 Word13 Word14 Word15 Word16
Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10 Word11 and Word12 Word13 Word14 Word15
Word1 Word2 Word3 Word4 and Word5
所需的输出:
Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9
Word1 Word2 Word3 Word4,
Word5 Word6 Word7 Word8 Word9 Word10 Word11
Word1 Word2 Word3 Word4,
Word5 Word6 Word7 Word8 Word9 Word10,
Word11 Word12 Word13 Word14 Word15 Word16
Word1 Word2 Word3 Word4,
Word5 Word6 Word7 Word8 Word9 Word10 Word11
and Word12 Word13 Word14 Word15
Word1 Word2 Word3 Word4 and Word5
谢谢您!
这是您的预期答案吗?
echo "Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10, Word11 Word12 Word13 Word14 Word15 Word16 Word17 Word18 Word19 Word20 Word21 and Word22 Word23 Word24." | grep -oE '[a-zA-Z0-9,.]+' | awk '
BEGIN {
cnt = 0
}
{
str = str " " $0
if ($0 ~ /,$/){
print str
cnt = 0
str = ""
}
else if (cnt < 10){
cnt++
}
else {
print str
cnt = 0
str = ""
}
} END {
print str
}' | sed 's/^ *//'
Word1 Word2 Word3 Word4,
Word5 Word6 Word7 Word8 Word9 Word10,
Word11 Word12 Word13 Word14 Word15 Word16 Word17 Word18 Word19 Word20 Word21
and Word22 Word23 Word24.