如何在 Rust 中创建自定义哈希函数

Question

如何创建可在 HashMap 和 HashSet 中使用的自定义哈希函数？我想使用 Szudzik 的配对函数，但我读过的所有文档都表明 Hasher 使用任意字节流。

_hash_val 是我想使用的哈希值

参考：https://en.wikipedia.org/wiki/Pairing_function#Other_pairing_functions

use std::hash::{Hash, Hasher};

#[derive(Copy, Clone)]
pub struct Position {
    pub x: i32,
    pub y: i32,
}

impl Position {
    // Constructor will pass in x and y
    pub fn new(x: i32, y: i32) -> Self {
        Self { x: x, y: y }
    }

}

impl PartialEq for Position {
    fn eq(&self, other: &Self) -> bool {
        self.x == other.x && self.y == other.y
    }
}

impl Eq for Position {}

impl Hash for Position {
    fn hash<H: Hasher>(&self, _state: &mut H) {
        let x: u64 = self.x.abs() as u64;
        let y: u64 = self.y.abs() as u64;
        let mut _hash_val: u64 = 0;

        /* szudziks function */
        if x >= y {
            _hash_val = x * x + x + y;
        } else {
            _hash_val = x + y * y;
        }
    }
}

我阅读的所有文档都表明我需要实现 std::hash::Hasher，但是文档指出：

A trait for hashing an arbitrary stream of bytes.

有没有办法创建不使用任意流的自定义哈希函数字节数？

编辑：

读完文档的第一行后，我没有再看下去，因为我假设一个字节是 8 位，这就是它可以操作的全部内容。但正如 cdhowie 指出的那样，这有点误导，因为您可以使用 write_u64() 方法。

https://doc.rust-lang.org/std/hash/trait.Hasher.html#method.write_u64

使用该方法修改上面的Hash实现：

impl Hash for Position {
    fn hash<H: Hasher>(&self, state: &mut H) {
        assert!(self.x >= 0);
        assert!(self.y >= 0);
    
        let x: u64 = self.x as u64;
        let y: u64 = self.y as u64;
    
        /* szudzik's pairing function */
        let hash_val: u64 = if x >= y {
            x * x + x + y
        } else {
            x + y * y
        };
    
        state.write_u64(hash_val);
    }
}

Answer 1

用于 Rust 哈希表集合的哈希是一个三步过程，由三个特征管理。

Hash

特征是你想要散列的元素需要实现。它所做的只是向哈希器提供字节。你几乎已经完成了；您只需将值写入哈希器即可。

fn hash<H: Hasher>(&self, state: &mut H) {
    let x = self.x.unsigned_abs() as u64;
    let y = self.y.unsigned_abs() as u64;

    /* szudziks function */
    let hash_val = if x >= y { x * x + x + y } else { x + y * y };
    state.write_u64(hash_val);
}

我已将您的

abs

更改为

unsigned_abs

以避免溢出。

这个

Hash

impl 适用于任何

Hasher

，因此您可以立即使用它，但由于大多数

Hasher

类型并不假设输入是均匀分布的（包括

HashMap

/

HashSet 的默认输入）

），他们将通过实际的哈希函数运行字节以获得最终的哈希值。

如果您认为您的值对于您的用例而言已经足够均匀分布，那么您可以创建一个自定义哈希器，将

u64

保持不变。如果您不能确保值分布均匀，您将观察到集合的糟糕性能。

#[derive(Default, Clone, Copy)]
pub struct IdentityHash(u64);

impl Hasher for IdentityHash {
    fn finish(&self) -> u64 {
        self.0
    }
    
    fn write(&mut self, _bytes: &[u8]) {
        panic!("This hasher only takes u64");
    }

    fn write_u64(&mut self, i: u64) {
        self.0 = i;
    }
}

有很多方法可以做到这一点，具体取决于如果哈希器用于不是单一类型的类型，您希望发生什么情况

u64

，但我已经做了最简单的一个作为示例。

最后一部分是

BuildHasher

，这就是哈希表集合在使用之间重置哈希器的方式。在这种情况下，由于

IdentityHash

没有太多状态，您可以在同一类型上实现它并让它复制自身。

impl BuildHasher for IdentityHash {
    type Hasher = Self;

    fn build_hasher(&self) -> Self::Hasher {
        *self
    }
}

现在您可以使用这些类型创建集合。

fn main() {
    let pos = Position::new(1, 2);
    let mut hasher = IdentityHash::default();
    pos.hash(&mut hasher);
    assert_eq!(hasher.finish(), 5);

    let mut set = std::collections::HashSet::with_hasher(IdentityHash::default());
    set.insert(Position::new(1, 2));
    assert!(set.contains(&Position::new(1, 2)));
}

游乐场

如何在 Rust 中创建自定义哈希函数

问题描述投票：0回答：1

1个回答

最新问题

如何在 Rust 中创建自定义哈希函数

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1