获取两个数组之间特定数量的交集

Question

出于优化目的，我需要将两个数组相交，并在结果数组中保留两个初始数组中重复值的最少数量。

结果数组中值的顺序并不重要。

另一个重要的限制是时间复杂度，因为这将在一个大循环中执行。

为什么 array_intersect 不起作用：

来自 PHP 文档中的 Shawn Pyle ：

array_intersect 以不同的方式处理数组中的重复项。如果第一个数组中有重复项，所有匹配的重复项都会被退回。如果任何后续数组中有重复项它们将不会被退回。

规则：

返回 $arr2 中的 $arr1 的值
如果 $arr1 或 $arr2 包含重复值，则返回两者之间最少数量的值

示例：

intersect([1, 1, 2, 3, 4, 4, 5], [1, 3, 3, 5, 5])

返回

[1, 3, 5]

intersect([1, 1, 2, 3, 4, 4, 5], [1, 1, 1, 3, 3, 5, 5])

返回

[1, 1, 3, 5]

intersect([1, 1, 2, 3, 4, 4, 5, 5], [1, 3, 3, 5, 5])

返回

[1, 3, 5, 5]

intersect([1, 1, 1], [1, 1, 1])

返回

[1, 1, 1]

intersect([1, 2, 3], [1, 3, 2])

返回

[1, 2, 3]

Answer 1

嗨，我不得不说，乍一看@Aderrahim 的答案看起来非常好，但后来我尝试使用一种简单的方法并测试性能。

这是代码：

function intersectSimple($a, $b)
{
    $result = array();
    $short = count($a) < count($b) ? $a : $b;
    $long = count($a) < count($b) ? $b : $a;
    foreach ($short as $v) {
        if (in_array($v, $long)) {
            //if found add to results and remove from b
            $result[] = $v;
            unset($long[array_search($v, $long)]);
        }
    }
    return $result;
}

function intersectAderrahim($a, $b)
{
    $a_values_count = array_count_values($a);
    $b_values_count = array_count_values($b);

    $res = array_values(array_intersect($a, $b));
    $res_values_count = array_count_values($res);
    foreach ($res as $key => $val)
    {
        if ($res_values_count[$val] > $a_values_count[$val] || $res_values_count[$val] > $b_values_count[$val])
        {
            unset($res[$key]);
            $res_values_count[$val]--;
        }
    }

    return array_values($res);
}

//Start timer
$start = microtime(true);

echo "Start Test\n";
//Test code print each assert result
//Run code 100000 times
for ($i = 0; $i < 100000; $i++)
{
    $a = array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    $b = array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    $result = intersectSimple($a, $b);
    assert(count($result) == 10);
}

//Stop timer
$end = microtime(true);
$time = $end - $start;
//Print performance in microseconds
echo "Performance Simple: $time\n";

//Start timer
$start = microtime(true);

echo "Start Test\n";
//Test code print each assert result
//Run code 100000 times
for ($i = 0; $i < 100000; $i++)
{
    $a = array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    $b = array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    $result = intersectAderrahim($a, $b);
    assert(count($result) == 10);
}
//Stop timer
$end = microtime(true);
$time = $end - $start;
//Print performance in microseconds
echo "Performance Aderrahim: $time\n";

所以我写了一个快速的性能测试并运行它，结果是：

Start Test
Performance Simple: 0.060362815856934
Start Test
Performance Aderrahim: 0.16634893417358

我不知道这是否可以推断到真实案例，但您可以在您的场景中尝试并测试哪个更好。我真的很想知道哪个是最好的真实数据。

Answer 2

这是我的尝试，如果可能的话，我基本上正在寻找一种更快的方法来做到这一点：

function intersect($a, $b)
{
    $a_values_count = array_count_values($a);
    $b_values_count = array_count_values($b);

    $res = array_values(array_intersect($a, $b));
    $res_values_count = array_count_values($res);
    foreach ($res as $key => $val)
    {
        if ($res_values_count[$val] > $a_values_count[$val] || $res_values_count[$val] > $b_values_count[$val])
        {
            unset($res[$key]);
            $res_values_count[$val]--;
        }
    }

    return array_values($res);
}

assert(intersect([1, 1, 2, 3, 4, 4, 5], [1, 3, 3, 5, 5]) == [1, 3, 5]);
assert(intersect([1, 1, 2, 3, 4, 4, 5], [1, 1, 1, 3, 3, 5, 5]) == [1, 1, 3, 5]);
assert(intersect([1, 1, 2, 3, 4, 4, 5, 5], [1, 3, 3, 5, 5]) == [1, 3, 5, 5]);
assert(intersect([1, 1, 1], [1, 1, 1]) == [1, 1, 1]);
assert(intersect([1, 2, 3], [1, 3, 2]) == [1, 2, 3]);

Answer 3

@AdriaRiuRuiz 的片段有几件事可以改进。

```
count()
```
不应在同一个未更改的数组上多次调用。
```
array_search()
```
与
```
in_array()
```
具有相同的用途，但返回第一个值的键。因此，
```
in_array()
```
可以省略；如果找到该值，则通过返回的键取消设置它。

代码：（基准）

function intersections(array $a, array $b): array
{
    $result = [];
    if (count($a) < count($b)) {
        $short = $a;
        $long = $b;
    } else {
        $short = $b;
        $long = $a;
    }
    foreach ($short as $v) {
        $index = array_search($v, $long);
        if ($index !== false) {
            $result[] = $v;
            unset($long[$index]);
        }
    }
    return $result;
}

根据输入数据，省略基于计数的数组排序可能会更快。

function intersections(array $a, array $b): array
{
    $result = [];
    foreach ($a as $v) {
        $index = array_search($v, $b);
        if ($index !== false) {
            $result[] = $v;
            unset($b[$index]);
        }
    }
    return $result;
}

获取两个数组之间特定数量的交集

问题描述投票：0回答：3

3个回答

最新问题

获取两个数组之间特定数量的交集

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3