我有一个以下输入格式的示例字符串。我正在尝试获取最重复的单词及其出现次数,如预期的输出格式所示。我们如何使用 java8 流 api 来实现这一点?
"Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms."
Ram -->3
is -->3
String text = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";
List<String> wordsList = Arrays.asList(text.split("[^a-zA-Z0-9]+"));
Map<String, Long> wordFrequency = wordsList.stream().map(word -> word.toLowerCase())
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
long maxCount = Collections.max(wordFrequency.values());
Map<String, Long> maxFrequencyList = wordFrequency.entrySet().stream().filter(e -> e.getValue() == maxCount)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
System.out.println(maxFrequencyList);
Imo,使用流对此并不是很有效,因为很难提取和应用流中可能会或可能不会改变的有用信息(
unless you write your own collector
)。
此方法使用 Java 8+ 地图增强功能,例如
merge
和 computeIfAbsent
。这还可以通过一次迭代计算包含关系的单词的频率。它通过使用两张地图来做到这一点。
individualFrequencies
- 每个单词出现次数的地图,以单词为键。equalFrequencies
- 包含具有相同频率的单词的地图,由频率键控。Map<String, Integer>
Map<Integer, List<String>>
。merge
返回的计数大于或等于 maxCount
,则该单词将被添加到从 equalMaxFrequencies map
获得的该计数的列表中。如果该计数不存在,则会创建一个新列表并将该单词添加到该列表中。 Map.computeIfAbsent 促进了这一过程。请注意,随着新条目的添加,此映射可能会包含大量过时的垃圾。人们想要的最后一个条目是通过 maxCount
键检索的条目。String sentence = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";
int maxCount = 0;
Map<String, Integer> individualfrequencies = new HashMap<>();
Map<Integer, List<String>> equalFrequencies = new HashMap<>();
for (String word : sentence.toLowerCase().split("[!;:,.\\s]+")) {
int count = individualfrequencies.merge(word, 1, Integer::sum);
if (count >= maxCount) {
maxCount = count;
equalFrequencies
.computeIfAbsent(count, v -> new ArrayList<>())
.add(word);
}
}
for (String word : equalFrequencies.get(maxCount)) {
System.out.printf("%s --> %d%n", word, maxCount);
}
打印
ram --> 3
is --> 3
有趣的是,并非所有单词都会出现在
equalFrequencies
地图中。此行为由处理单词的顺序决定。一旦重复一个单词,后面的任何其他单词都不会出现,除非它们等于或超过当前的 maxCount。
String[] words = {"tall","taller","tallest"};
// Create a map to store the word counts
Map<String, Long> wordCounts = Arrays.stream(words)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// Find the most common word
String mostCommonWord = wordCounts.entrySet().stream()
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey)
.orElse("");
// Print the most common word
System.out.println("The most common word is: " + mostCommonWord);
}