我有一批ID,我想用良好的线性扩展功能对其进行分区。ID不包含时间戳,并且确实传播不佳。我只限于几个哑巴xpath operators。
您能否提出更好的功能来在10个存储桶之间分配ID?
public static void main(String[] args) {
int[] buckets = new int[10];
for (int i = 0; i < 10; i++)
buckets[i] = 0;
for (int i = 0; i < 1000; i++) {
String id = format("130770%s0020", i);
long l = parseLong(id);
int partition = (int) f(l);
buckets[partition] = buckets[partition] + 1;
}
for (int i = 0; i < 10; i++)
out.println(buckets[i]);
}
目前我最好的结果是
private static long f(long l) {
return l % 31 % 10;
}
给出
130 96 97 96 97 97 97 96 98 97 96
您可以提出更好的实施方案吗?
如果您的目标是使事物在存储桶之间平均分配,这似乎可行:
return ((l / 10000) % 1000) % 10;
((这只是从数字中提取i
回来]
输出:
100 100 100 100 100 100 100 100 100 100
似乎提供相同结果的替代方法:
// NB: abs(int) isn't always non-negative. Should really handle Integer.MIN_VALUE.
return Math.abs(Long.toString(l).hashCode()) % 10;
输出:
100 100 100 100 100 100 100 100 100 100
我建议选择与HashMap
类相同的解决方案。
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
对于您的代码,其含义是:
return (l ^ (l >>> 16)) % 10;
使用您的测试数据,产生的传播范围:
109 102 103 94 91 95 93 100 104 109
来自comment:
我没有班次
表达式l >>> 16
也可以写为l / 65536
,但是除法比移位慢很多,因此这就是为什么您通常使用l >>> 16
。
UPDATE来自另一个comment:
我没有XOR运算符
使用+
代替^
不好,但是在这里看起来足够好:
return (l + (l / 65536)) % 10;
结果传播:
101 92 92 99 105 104 105 99 97 106