使用纯`bash`打印每个单词及其出现次数

问题描述 投票:0回答:2

我在下面给出了代码。我想打印每个单词及其出现次数,而不使用外部工具,如wcawktr等。

我可以计算单词的总数,但在这里我也有一个问题:在输出中我没有得到总字数,输出小于它应该是的。

我该怎么办?

#!/bin/bash
#v=1

echo -n "ENTER FILE NAME: "
read file
IFS=$'\n'
cnew_line=`echo -e "\n"`
cspace=`echo  " "`

if [ $# -ne 0 ] 
then

echo "You didn't entered a filename as a parameter"
exit

elif [ $# -eq 0 ] 
then
filename="$file"

num_line=0
num_word=0
num_char=0

while read -n1  w
do
if [ "$w" = "$cnew_line" ]
then
(( num_line++ ))
elif [ "$w" = "$cspace" ]
then

(( num_word++ ))

else
(( num_char++ ))
fi
done < "$filename"


echo "Line Number = $num_line"
echo "Word Number = $num_word"
echo "Character Number =$num_char"

fi

    enter code here
bash shell
2个回答
0
投票

您可以使用关联数组来计算单词,有点像这样:

$ cat foo.sh
#!/bin/bash                                                                     

declare -A words

while read line
do
    for word in $line
    do
        ((words[$word]++))
    done
done

for i in "${!words[@]}"
do
    echo "$i:" "${words[$i]}"
done

测试它:

$ echo this is a test is this | bash foo.sh
is: 2
this: 2
a: 1
test: 1

这个答案几乎是从这些很好的答案构建的:thisthis。不要忘记对它们进行投票。


0
投票

James Brown's answer的两个改进版本(考虑一个单词的标点符号,打破双引号和单引号组):

  1. 标点符号被视为单词的一部分: #!/bin/bash declare -A words while read line ; do for word in ${line} ; do ((words[${word@Q}]++)) done ; done for i in ${!words[@]} ; do echo ${i}: ${words[$i]} done
  2. 标点符号不是单词的一部分,(如wc): #!/bin/bash declare -A words while read line ; do line="${line//[[:punct:]]}" for word in ${line} ;do ((words[${word}]++)) done ; done for i in ${!words[@]} ;do echo ${i}: ${words[$i]} done

经过测试的代码,带有棘手的引用文字:

  • fortune -m "swear" | bash foo.sh
  • man bash | ./foo.sh | sort -gr -k2 | head
© www.soinside.com 2019 - 2024. All rights reserved.