如何计算Shell中的重复句子

问题描述 投票:1回答:3
cat file1.txt
abc bcd abc ...
abcd bcde cdef ...
abcd bcde cdef ...
abcd bcde cdef ...
efg fgh ...
efg fgh ...
hig ...

我的预期结果如下:

abc bcd abc ...      

abcd bcde cdef ...  
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 3 times !!!>

hig ...

我找到了解决问题的方法,但我的代码有点吵。

cat file1.txt | uniq -c | sed -e 's/ \+/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /\n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/\n<!!! pay attention, above sentence has repeated & times !!!> \n/g' -e 's/[1]$//g'

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 2 times !!!>

hig ...

我想知道你是否能给我更高效的方式来实现这个目标。谢谢你们。

linux shell awk sed
3个回答
1
投票

如果您的线路尚未分组,那么您可以使用

awk '
    NR == FNR {count[$0]++; next} 
    !seen[$0]++ {
        print
        if (count[$0] > 1)
            print "... repeated", count[$0], "times"
    }
' file1.txt file1.txt

如果文件非常大,这将消耗大量内存。您可能希望先对其进行排序。


2
投票

sort + uniq + sed解决方案:

sort file1.txt | uniq -c | sed -E 's/^ +1 (.+)/\1\n/; 
 s/^ +([2-9]|[0-9]{2,}) (.+)/\2\n<!!! pay attention, the above sentence has repeated \1 times !!!>\n/'

输出:

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, the above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, the above sentence has repeated 2 times !!!>

hig ...

或者与awk

sort file1.txt | uniq -c | awk '{ n=$1; sub(/^ +[0-9]+ +/,""); 
printf "%s\n%s",$0,(n==1? ORS:"<!!! pay attention, the above sentence has repeated "n" times !!!>\n\n") }'

2
投票
$ awk '
    $0==prev { cnt++; next }
    { prt(); prev=$0; cnt=1 }
    END { prt() }
    function prt() {
        if (NR>1) print prev (cnt>1 ? ORS "repeated " cnt " times" : "") ORS
    }
' file
abc bcd abc ...

abcd bcde cdef ...
repeated 3 times

efg fgh ...
repeated 2 times

hig ...
© www.soinside.com 2019 - 2024. All rights reserved.