我正在使用一些大型数据集,并试图提高性能。我需要确定对象是否包含在数组中。我正在考虑使用index
或include?
,所以我对两者进行了基准测试。
require 'benchmark'
a = (1..1_000_000).to_a
num = 100_000
reps = 100
Benchmark.bmbm do |bm|
bm.report('include?') do
reps.times { a.include? num }
end
bm.report('index') do
reps.times { a.index num }
end
end
令人惊讶的是(对我来说),index
相当快。
user system total real
include? 0.330000 0.000000 0.330000 ( 0.334328)
index 0.040000 0.000000 0.040000 ( 0.039812)
由于index
提供的信息比include?
更多,我原本预计它会稍微慢一些,尽管事实并非如此。为什么它更快?
(我知道index
直接来自数组类,include?
继承自Enumerable。可能会解释它吗?)
看看Ruby MRI源,似乎index
使用优化的rb_equal_opt
,而include?
使用rb_equal
。这可以在rb_ary_includes和rb_ary_index中看到。 Here是做出改变的提交。我不清楚为什么它用于index
而不是include?
您可能还会发现阅读有关此feature的讨论很有趣
我为基准测试运行了相同的测试。好像包括?比索引快,但不是很一致。以下是两种不同场景的结果。
user system total real
index 0.065803 0.000652 0.066455 ( 0.067181)
include? 0.065551 0.000590 0.066141 ( 0.066894)
user system total real
index 0.000034 0.000005 0.000039 ( 0.000037)
include? 0.000017 0.000001 0.000018 ( 0.000017)
码:
require 'benchmark'
# parse ranks and return number of reports to using index
def solution_using_index(ranks)
return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1
return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ? 1 : 0) if ranks.length == 2
return 0 if ranks.max > 1000000000 || ranks.min < 0
grouped_ranks = ranks.group_by(&:itself)
report_to, rank_keys= 0, grouped_ranks.keys
rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.index(rank+1) }
report_to
end
# parse ranks and return number of reports to using include
def solution_using_include(ranks)
return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1
return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ? 1 : 0) if ranks.length == 2
return 0 if ranks.max > 1000000000 || ranks.min < 0
grouped_ranks = ranks.group_by(&:itself)
report_to, rank_keys= 0, grouped_ranks.keys
rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.include?(rank+1) }
report_to
end
test_data = [[3, 4, 3, 0, 2, 2, 3, 0, 0], [4, 4, 3, 3, 1, 0], [4, 2, 0] ]
Benchmark.bmbm do |bm|
bm.report('index') do
test_data.each do |ranks|
reports_to = solution_using_index(ranks)
end
end
bm.report('include?') do
test_data.each do |ranks|
reports_to = solution_using_include(ranks)
end
end
end