有一个供稿,我从中获取数据,有时其中会出现非常相似的记录。https://dl4.joxi.net/drive/2020/01/17/0028/2950/1842054/54/5abb738180.jpg
我要确保该数组包含最唯一的记录。 (按标题定义)
代码:
$new = array();
$goodFeed = array();
$itemlimit=0;
$itemlimit2=0;
foreach ($feed->get_items() as $item) {
if ($itemlimit==50) { break; };
$new[] = strtolower(trim($item->get_title()));
$itemlimit = $itemlimit + 1;
}
foreach ($feed->get_items() as $item) {
if ($itemlimit2==50) { break; };
$itemTitle = strtolower(trim($item->get_title()));
foreach($new as $item2) {
similar_text($item2, $itemTitle, $percent);
if ($percent < 78 && !in_array($item, $goodFeed)) {
$goodFeed[] = $item;
echo 'added: ' . $item->get_title() . '<br>Procent: ' . $percent . '<hr>';
}
}
$itemlimit2 = $itemlimit2 + 1;
}
我只希望将唯一值(至少80%)保留在$ goodFeed数组中。现在,它包含彼此非常相似的元素。原始Feed具有名称为:
的元素1. Metro Redux on Nintendo Switch™ Announce Trailer;
2. Metro Redux on Nintendo Switch™ Announce Trailer [NA];
3. Metro Redux für Nintendo Switch™ Ankündigungs-Trailer [DE];
4. Metro Redux on Nintendo Switch™ Announce Trailer [ANZ];
5. The Elder Scrolls Online: The Dark Heart of Skyrim Announcement Cinematic;
6. The Elder Scrolls Online - The Dark Heart of Skyrim Cinematic Announcement Trailer
它们都进入$ goodFeed,我只需要这些:
1. Metro Redux on Nintendo Switch™ Announce Trailer
5. The Elder Scrolls Online: The Dark Heart of Skyrim Announcement Cinematic
谢谢!
I have not tested but I think one of these should work for you.
foreach ($feed->get_items() as $item) {
if(!strtolower(trim($item->get_title())),$new){
if ($itemlimit==50) { break; };
$new[] = strtolower(trim($item->get_title()));
$goodFeed[] = $item;
$itemlimit = $itemlimit + 1;
}
}
-------OR-------
foreach ($feed->get_items() as $item) {
if(!strtolower(trim($item->get_title())),$new){
if(count($new)>0){
$percent=0;
foreach($new as $n){
similar_text($n, strtolower(trim($item->get_title())), $percent);
if($percent>78){
break;
}
}
if($percent>78){
continue;
}
if ($itemlimit==50) { break; };
$new[] = strtolower(trim($item->get_title()));
$goodFeed[] = $item;
$itemlimit = $itemlimit + 1;
}
else{
$new[] = strtolower(trim($item->get_title()));
$goodFeed[] = $item;
$itemlimit = $itemlimit + 1;
}
}
}
问题是解析器没有传输正确的提要。回收了数组结构,现在可以正常工作了。我也从这里接受了一些想法-Similarity algorithm advice, using two dimensional associative array
[如果有人知道可以将提要合并为一个的良好且仍受支持的RSS解析器(NodeJs,Php),如果您可以链接到它,我将不胜感激。