我有一个像这样的数组:
['ball', 'football', 'volleyball', 'football player', 'football league', 'tennis']
我想根据“足球”关键字对它进行如下排序:
['football', 'football player', 'football league', 'ball', 'volleyball', 'tennis']
我怎样才能实现这个目标?
您需要制作一个自定义排序函数,然后与
usort.
一起使用
$array=["ball","football","volleyball","football player","football league","tennis"];
function footsort($a,$b) {
$afoot=substr($a,0,8)=="football";
$bfoot=substr($b,0,8)=="football";
if ($afoot==$bfoot) return strcmp($a,$b);
/*else*/
if ($afoot) return -1;
if ($bfoot) return 1;
}
usort($array,"footsort");
print_r($array);
回应:
Array
(
[0] => football
[1] => football league
[2] => football player
[3] => ball
[4] => tennis
[5] => volleyball
)
如果您想根据关键字是否在单词中的任何位置而不是仅在开头进行排序,您可以在比较功能中使用
strpos
。
$keyword = 'football';
usort($things, function($a, $b) use ($keyword) {
$x = strpos($a, $keyword) === false;
$y = strpos($b, $keyword) === false;
if ($x && !$y) return 1;
if ($y && !$x) return -1;
// use this if you want to sort alphabetically after the keyword sort:
return strcmp($a, $b);
// or if you only want to sort by whether or not the keyword was found:
return 0;
});
如果您有一个更一般的目标,即根据术语与关键字的“接近度”对数组进行排序,则比较必须变得更加复杂,并且应该完成的方式实际上取决于“接近度”的哪些方面最重要给你。这是一个更复杂的排序示例,可能不完全是您想要的,但只是为了展示我的意思是确定“接近度”的可能复杂性:
$keyword = 'football';
usort($things, function($a, $b) use ($keyword) {
// prioritize exact matches first
if ($a == $keyword) return -1;
if ($b == $keyword) return 1;
// prioritize terms containing the keyword next
$x = strpos($a, $keyword);
$y = strpos($b, $keyword);
if ($x !== false && $y === false) return -1;
if ($y !== false && $x === false) return 1;
if ($x !== false && $y !== false) { // both terms contain the keyword, so...
if ($x != $y) { // prioritize matches closer to the beginning of the term
return $x > $y ? 1 : -1;
}
// both terms contain the keyword at the same position, so...
$al = strlen($a);
$bl = strlen($b);
if ($al != $bl) { // prioritize terms with fewer characters other than the keyword
return $al > $bl ? 1 : -1;
}
// both terms contain the same number of additional characters
return 0;
// or sort alphabetically with strcmp($a, $b);
// or do additional checks...
}
// neither terms contain the keyword
// check the character similarity...
$ac = levenshtein($keyword, $a);
$bc = levenshtein($keyword, $b);
if ($ac != $bc) {
return $ac > $bc ? 1 : -1;
}
return 0;
// or sort alphabetically with strcmp($a, $b);
// or do additional checks, similar_text, etc.
});
我试图理解你的问题,并尝试像这样解决
<?php
$abc =["ball","football","volleyball","football player", "football league", "tennis"];
$word ="football";
$final = array();
// collect complete match
foreach($abc as $key=>$value){
if($value==$word){
$final[] = $value;
unset($abc[$key]);
}
}
//collect if word found in another string
foreach($abc as $key=>$value){
if(strpos($value,$word)!==false){
$final[] = $value;
unset($abc[$key]);
}
}
// collect if another string have some part of word
foreach($abc as $key=>$value){
if(strpos($word,$value)!==false){
$final[] = $value;
unset($abc[$key]);
}
}
// collect rest of the elements
$final = array_merge($final,$abc);
print_r($final);
?>
输出是
Array
(
[0] => football
[1] => football player
[2] => football league
[3] => ball
[4] => volleyball
[5] => tennis
)
这是一个有趣的小问题,您需要为大海捞针(单词数组)的每个元素分配某种分数。我认为对每个元素进行评分的最佳方法是基于经典的动态规划问题“最长公共子串”。该子串越长,排序分数越高。
//find the longest commson substring between 2 strings, return all substrings of that length
function longestCommonSubstring($string1, $string2) {
$helper = array();
//create two dimensional array, to keep track
for($i =0; $i < strlen($string1); $i++) {
$helper[$i] = array();
for($j=0; $j< strlen($string2); $j++) {
//intialize all values to 0
$helper[$i][] = 0;
}
}
$max= 0;
$ans = array();
for($i =0; $i <strlen($string1); $i++) {
for($j =0; $j < strlen($string2); $j++) {
if ($string1[$i] == $string2[$j]) {
if($i==0 || $j==0) {
$helper[$i][$j] = 1;
} else {
$helper[$i][$j] = $helper[$i-1][$j-1] + 1;
}
if ($helper[$i][$j] > $max) {
$max = $helper[$i][$j];
$ans = array(substr($string1, $i-$max+1, $max));
} elseif($helper[$i][$j] == $max) {
$ans[] = substr($string1, $i-$max+1, $max);
}
} else {
$helper[$i][$j] = 0;
}
}
}
return $ans;
}
既然函数已经写好了,我们需要使用它。
foreach($words as $word) {
$lcs = longestCommonSubstring($keyword, $word);
}
好了,这一切都很好,但是仅仅使用该函数只是成功的一半,现在我们需要对结果应用一些逻辑。让我们将结果保存在一个数组中,并给每个单词一个分数。一个好的分数是最长子串的长度。
football
会比 ball
更好的匹配,因为它有一个更长的共同字符串。但是 football
和 football player
呢,它们的最长公共子串长度相同?为了解决这个问题,我们可以使用长度占总单词长度的百分比。结合最长子串长度和百分比这两个想法,我们得到下面的代码。
//an associative array to save the scores
// $wordsMeta[$word] = array(lengthOfCommonSubstring, percentageOfWordMatched)
$wordsMeta = array();
//go through each word and assign a score
foreach($words as $word) {
$lcs = longestCommonSubstring($keyword, $word);
if (count($lcs) ==0 ) {
$wordPercentage = 0;
$wordLength = 0;
} else {
$wordLength = strlen($lcs[0]);
$wordPercentage = $wordLength/strlen($word);
}
$wordsMeta[$word] = array(
"percentageOfWordMatched" => $wordPercentage,
"lengthOfCommonSubstring" => $wordLength
);
}
现在我们只需要一个排序函数,它首先查看长度,如果它们相等,它将查看百分比并返回适当的整数。
//our special sorting function
//checks length, if that is equal, then it checks percentage of word matched
//if both are eqaul, then those two elements are considered equal
$sort = function($a, $b) {
$ans = $a["lengthOfCommonSubstring"] - $b["lengthOfCommonSubstring"];
if ($ans == 0) {
$ans = $a["percentageOfWordMatched"] - $b["percentageOfWordMatched"];
}
if ($ans < 0) {
$ans = -1;
} elseif ($ans > 0){
$ans = 1;
} else {
$ans = 0;
}
//higher number = lower sort order
$ans *= -1;
return $ans;
};
现在是简单的部分:
uasort($wordsMeta)
和$answer= array_keys($wordsMeta)
当心恶魔 - 这个算法很慢。非常慢。
lcs
是 O(n*m)
,我们称之为 count($words)
次。进行评分过程 O(n*m*x)
其中:
n
是 strlen($keyword)
m
是 strlen($word)
x
是 count($words)
另外我们正在排序,这是
O(n * log(n))
。所以总的来说这个算法是O(n*m*x + n*log(n))
,这不好。保持单词列表简短、单词列表中的单词简短以及关键字简短将会降低速度。