我正在研究化学元素,我想要一个“简单”的函数,将元素符号映射到数组的唯一索引中。我知道许多编程语言都实现了哈希,但我更感兴趣的是您通常如何处理这样的问题。在元素之上,我还包括了 Me、Et、Ph 和 D - 因此完整的符号列表如下所示:
Ag、Al、Ar、As、Au、B、Ba、Be、Bi、Br、C、Ca、Cd、Ce、Cl、Co、Cr、Cs、Cu、D、Dy、Er、Et、Eu、F , Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Me, Mg, Mn, Mo, N, Na, Nb, Nd , Ne, Ni, O, Os, P, Pa, Pb, Pd, Ph, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb , Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
“最佳”解决方案是具有尽可能低的最高指数,并且指数的计算需要简单。你会怎么做?通过一些随机尝试,我得到了一个最高索引为 355 的版本,使用以下公式,效率约为 25%(356 个桶中的 88 个符号):
A = CharcodeFirstLetter * 21 mod 325
B = CharcodeSecondLetter-60 'B is set to 0 if symbol does not have a second letter
index = A + B - 13
是否有任何系统的方法可以从一组通用的起始符号创建此类公式?
一种方法是以编程方式尝试一堆参数值并获得最佳结果。我制作了一串符号,并在只有一个字符的地方为第二个字符留了一个空格。我发现了很多比你的 356 个桶更好的例子。到目前为止我发现的最好的是 238 个桶。
uint32_t A = (symbol[0]*207) % 232;
uint32_t B = (symbol[1] == ' ' ? 46 : symbol[1] - 60);
index = A+B-51;
您可以创建一个完美的哈希值。例如。使用
gperf
我生成了一个哈希,将 88 个键映射到 [8..96] 范围内的唯一整数。
C++ 代码现代化为 constexpr 函数:
constexpr inline unsigned int hash(std::string_view str) {
static constexpr std::array<uint8_t, 128> asso_values{
97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97,
97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97,
97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 27,
10, 8, 7, 56, 87, 55, 44, 57, 97, 84, 48, 46, 6, 12, 22, 97, 28, 20, 17, 89, 11, 9,
79, 66, 53, 97, 97, 97, 97, 97, 97, 7, 16, 64, 19, 6, 27, 26, 29, 25, 97, 97, 32, 61,
34, 31, 97, 97, 10, 52, 36, 20, 97, 97, 97, 62, 97, 97, 97, 97, 97, 97};
unsigned hval = str.length();
switch (hval) {
default : hval += asso_values[static_cast<uint8_t>(str[1])]; [[fallthrough]];
case 1 : hval += asso_values[static_cast<uint8_t>(str[0])]; break;
}
return hval;
}
现在您甚至可以在编译时执行映射!反向映射:
静态 constexpr std::array
#include <array>
#include <cstdint>
#include <string_view>
/* C++ code produced by gperf version 3.1 */
/* Command-line: gperf -L C++ -7 -C -G -E -m 100 test.txt */
/* Computed positions: -k'1-2' */
/* Modernized by hand, @sehe */
enum {
TOTAL_KEYWORDS = 88,
MIN_WORD_LENGTH = 1,
MAX_WORD_LENGTH = 2,
MIN_HASH_VALUE = 7,
MAX_HASH_VALUE = 96
};
/* maximum key range = 90, duplicates = 0 */
constexpr inline unsigned int hash(std::string_view str) {
constexpr std::array<uint8_t, 128> asso_values{
97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97,
97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97,
97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 27,
10, 8, 7, 56, 87, 55, 44, 57, 97, 84, 48, 46, 6, 12, 22, 97, 28, 20, 17, 89, 11, 9,
79, 66, 53, 97, 97, 97, 97, 97, 97, 7, 16, 64, 19, 6, 27, 26, 29, 25, 97, 97, 32, 61,
34, 31, 97, 97, 10, 52, 36, 20, 97, 97, 97, 62, 97, 97, 97, 97, 97, 97};
unsigned hval = str.length();
switch (hval) {
default : hval += asso_values[static_cast<uint8_t>(str[1])]; [[fallthrough]];
case 1 : hval += asso_values[static_cast<uint8_t>(str[0])]; break;
}
return hval;
}
static constexpr std::array<std::string_view, 97> wordlist{
"", "", "", "", "", "", "", "N", "D", "C", "W", "B", "V", "O", "Ne", "Na", "Ce",
"Ca", "Be", "Ba", "Cr", "S", "Br", "P", "Nb", "Te", "Ta", "Nd", "Se", "Cd", "Cu", "Pa", "Sr", "Ni",
"Pr", "Tb", "Re", "Bi", "Sb", "Ar", "Pb", "Co", "Cl", "Pd", "Ti", "H", "Rb", "Si", "Th", "Au", "Ru",
"Tl", "He", "Ph", "Me", "Ag", "Sn", "La", "I", "Rh", "Pt", "Al", "Cs", "Ge", "Ga", "Zr", "Os", "Y",
"Er", "Ir", "Lu", "Dy", "Hg", "Hf", "Mg", "Li", "Gd", "Ho", "Eu", "Mo", "Tm", "As", "Mn", "Sm", "Yb",
"K", "Sc", "Xe", "F", "Zn", "U", "", "", "In", "Et", "Fe", "Kr"};
bool in_word_set(std::string_view s) {
if (auto len = s.length(); len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
if (unsigned key = hash(s); key < wordlist.size())
return wordlist[key] == s;
return false;
}