正则表达式找到连续的单词[关闭]

Question

我想显示并比较Bash中单词'The'之后所有单词的出现次数。

例：

The next generation will be ruled by the smartphones. The next thing is interesting to watch.The question is how do we solve this problem

所以预期的输出是：

next                   2

smartphone             1

question               1

以下是我尝试的命令：

cat file.txt | tr A-Z a-z |grep 'the '  | cut -d\  -f2| sort |uniq -c|sort -nr

但是这个命令并没有给我一个不准确的结果。它给了我输出的单词'the'之后实际上并不存在的单词

Answer 1

使用GNU grep：

grep -Poi 'the \K\w.*?\b' file | sort | uniq -c | awk '{print $2,$1}'

要么

grep -Poi 'the \K\w.*?\b' file | awk '{count[$1]++}END{for(j in count) print j, count[j]}'

输出：

next 2
question 1
smartphones 1

Answer 2

你不需要grep或tr; gnu Awk单独就足以完成这项任务。

$ awk -F"[ [:punct:]]" '{i=1; for(i=2; i<=NF; i++) if($(i-1) ~ /^[Tt]he$/) a[$i]++}  END{ for(i in a) print i,a[i]}' file
next 2
question 1
smartphones 1

if($(i-1) ~ /^[Tt]he$/：如果前一个字段匹配the或The，则将当前字段存储在关联数组a中

正则表达式找到连续的单词[关闭]

问题描述投票：0回答：2

2个回答

最新问题

正则表达式找到连续的单词[关闭]

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2