测试两个散列是否相等的big-O复杂性是什么？

Question

我认为答案是O（hash_equals）= n，但我正确吗？

例如在Ruby中，我们有：

$ irb
2.6.3 :001 > {a: 1, b: 2, c:3} == {b:2, a:1, c:3}
 => true

==的实现在这里：https://github.com/ruby/ruby/blob/trunk/hash.c#L3567

编辑：根据@Schwern（在下面的评论中），我们假设带有简单值的纯散列。我特别感兴趣的是Ruby，但是我也对大多数语言/哈希实现中的更普遍的问题感兴趣。

Answer 1

散列比较平均为O（键）+ ∑Cost（值）。

如果值简单，则比较每个值都是O（1）。哈希比较是O（keys）+ ∑O（1），它是O（keys）。

Ruby做了一些额外的事情，但是基本算法很简单。

def hash_eq(hash1, hash2)
  # O(1)
  return true if hash1.equal?(hash2)

  # O(1)
  return false if hash1.size != hash2.size

  # O(keys)
  hash1.each {|k,v|
    #               O(1)     O(value)
    return false if hash2[k] != v
  }

  return true
end

检查对象相等性只是比较两个指针，所以O（1）。

[Ruby散列将其大小存储在num_entries中，因此比较它们的大小为O（1）。

迭代哈希的键和值是O（keys），基本上就是在遍历数组。

在散列中查找值是O（1）。最糟糕的情况是O（n），但是在现代哈希中尤其如此，尤其是在发现它之后to be a security issue。

最后，比较值有其自身成本。对于简单的值（例如数字），此值为O（1）。其他任何东西都有其自己的成本，我称之为O（val）。这可以是大字符串（为O（n），其中n是字符串的大小），也可以是复杂的嵌套数据结构。我们需要对每个值执行此操作，并且每个值都有自己的成本。我将其写为所有这些费用的总和：∑Cost（value）。

因此，O（1）+ O（1）+ O（键）+ ∑成本（值）。 O（1）被O（键）淹没并消失，因此O（键）+ ∑Cost（值）。

一个简单的基准证明了这一点。

require 'benchmark'

def benchmark_hash_eq(&block)
  # We can't use h2 = h1.clone because that will not clone
  # the values. Then Ruby can cheat and use object equality to
  # compare the values.
  h1 = block.call
  h2 = block.call

  puts "Size: #{h1.size}"
  puts Benchmark.measure {
    10_000.times { h1 == h2 }
  }
end

puts "With number values"
benchmark_hash_eq do
  Hash[(1..100).collect { |i| [i, i] }]
end
benchmark_hash_eq do
  Hash[(1..1_000).collect { |i| [i, i] }]
end
benchmark_hash_eq do
  Hash[(1..10_000).collect { |i| [i, i] }]
end

puts "With large, equal size strings"
benchmark_hash_eq do
  Hash[(1..100).collect { |i| [i, "a" * 1000] }]
end
benchmark_hash_eq do
  Hash[(1..1_000).collect { |i| [i, "a" * 1000] }]
end
benchmark_hash_eq do
  Hash[(1..10_000).collect { |i| [i, "a" * 1000] }]
end

With number values
Size: 100
  0.019237   0.000043   0.019280 (  0.019306)
Size: 1000
  0.195047   0.000515   0.195562 (  0.196367)
Size: 10000
  1.913112   0.003115   1.916227 (  1.920030)

With large, equal size strings
Size: 100
  0.065414   0.000225   0.065639 (  0.065839)
Size: 1000
  0.669863   0.001145   0.671008 (  0.672279)
Size: 10000
 10.041987   0.013201  10.055188 ( 10.066840)

完全线性，除了最后一个可能是由于散列大小达到某个内存阈值。

请注意，Ruby可以并行进行比较。这不会改变复杂性，但是可能会使它执行得更快。

测试两个散列是否相等的big-O复杂性是什么？

问题描述投票：0回答：1

1个回答

最新问题

测试两个散列是否相等的big-O复杂性是什么？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1