如何使用java8流查找字符串中最常见的单词？

Question

我有一个以下输入格式的示例字符串。我正在尝试获取最重复的单词及其出现次数，如预期的输出格式所示。我们如何使用 java8 流 api 来实现这一点？

输入：

"Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms."

预期输出：

Ram -->3
is -->3

Answer 1

    String text = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";
    List<String> wordsList = Arrays.asList(text.split("[^a-zA-Z0-9]+"));
    Map<String, Long> wordFrequency = wordsList.stream().map(word -> word.toLowerCase())
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    long maxCount = Collections.max(wordFrequency.values());

    Map<String, Long> maxFrequencyList = wordFrequency.entrySet().stream().filter(e -> e.getValue() == maxCount)
            .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

    System.out.println(maxFrequencyList);

Answer 2

Imo，使用流对此并不是很有效，因为很难提取和应用流中可能会或可能不会改变的有用信息（

unless you write your own collector

）。

此方法使用 Java 8+ 地图增强功能，例如

merge

和

computeIfAbsent

。这还可以通过一次迭代计算包含关系的单词的频率。它通过使用两张地图来做到这一点。

```
individualFrequencies
```
- 每个单词出现次数的地图，以单词为键。
```
equalFrequencies
```
- 包含具有相同频率的单词的地图，由频率键控。
Map.merge 方法用于计算
```
Map<String, Integer>
```
另一张地图用于统计所有具有该频率的单词。它被声明为
```
Map<Integer, List<String>>
```
。
如果
```
merge
```
返回的计数大于或等于
```
maxCount
```
，则该单词将被添加到从
```
equalMaxFrequencies map
```
获得的该计数的列表中。如果该计数不存在，则会创建一个新列表并将该单词添加到该列表中。 Map.computeIfAbsent 促进了这一过程。请注意，随着新条目的添加，此映射可能会包含大量过时的垃圾。人们想要的最后一个条目是通过
```
maxCount
```
键检索的条目。

String sentence = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";

int maxCount = 0;
Map<String, Integer> individualfrequencies = new HashMap<>();
Map<Integer, List<String>> equalFrequencies = new HashMap<>();

for (String word : sentence.toLowerCase().split("[!;:,.\\s]+")) {
    int count = individualfrequencies.merge(word, 1, Integer::sum);
    if (count >= maxCount) {
        maxCount = count;
        equalFrequencies
                .computeIfAbsent(count, v -> new ArrayList<>())
                .add(word);
    }
}

for (String word : equalFrequencies.get(maxCount)) {
    System.out.printf("%s --> %d%n", word, maxCount);
}

打印

ram --> 3
is --> 3

有趣的是，并非所有单词都会出现在

equalFrequencies

地图中。此行为由处理单词的顺序决定。一旦重复一个单词，后面的任何其他单词都不会出现，除非它们等于或超过当前的 maxCount。

Answer 3

String[] words = {"tall","taller","tallest"};       
    
    // Create a map to store the word counts
    Map<String, Long> wordCounts = Arrays.stream(words)
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    // Find the most common word
    String mostCommonWord = wordCounts.entrySet().stream()
            .max(Map.Entry.comparingByValue())
            .map(Map.Entry::getKey)
            .orElse("");

    // Print the most common word
    System.out.println("The most common word is: " + mostCommonWord);
}

如何使用java8流查找字符串中最常见的单词？

问题描述投票：0回答：3

输入：

预期输出：

3个回答

最新问题

如何使用java8流查找字符串中最常见的单词？

问题描述 投票：0回答：3

输入：

预期输出：

3个回答

最新问题

问题描述投票：0回答：3