如何衡量函数的内存使用情况？

Question

我注意到 Rust 的测试有一个基准模式，可以测量执行时间

ns/iter

，但我找不到测量内存使用情况的方法。

我将如何实施这样的基准？让我们暂时假设我现在只关心堆内存（尽管堆栈使用肯定也很有趣）。

编辑：我发现这个问题要求完全相同的事情。

Answer 1

您可以使用 jemalloc 分配器打印分配统计信息。例如，

Cargo.toml：

[package]
name = "stackoverflow-30869007"
version = "0.1.0"
edition = "2018"

[dependencies]
jemallocator = "0.5"
jemalloc-sys = {version = "0.5", features = ["stats"]}
libc = "0.2"

src/main.rs:

use libc::{c_char, c_void};
use std::ptr::{null, null_mut};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

extern "C" fn write_cb(_: *mut c_void, message: *const c_char) {
    print!("{}", String::from_utf8_lossy(unsafe {
        std::ffi::CStr::from_ptr(message as *const i8).to_bytes()
    }));
}

fn mem_print() {
    unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) }
}

fn main() {
    mem_print();
    let _heap = Vec::<u8>::with_capacity (1024 * 128);
    mem_print();
}

在单线程程序中，应该可以让您很好地测量结构占用的内存量。只需打印结构创建之前和之后的统计数据并计算差异即可。

（特别是“分配”的“总计：”。）

divan

实现自定义分配器以测量基准测试中的内存使用情况。

您还可以使用 Valgrind (Massif) 来获取堆配置文件。它的工作方式与任何其他 C 程序一样。确保在可执行文件中启用了调试符号（例如使用调试构建或自定义 Cargo 配置）。例如，您可以使用 http://massiftool.sourceforge.net/ 来分析生成的堆配置文件。

（我已验证此功能可在 Debian Jessie 上运行，在不同的设置中，您的效果可能会有所不同）。

（为了将 Rust 与 Valgrind 一起使用，您可能必须切换回系统分配器）。

附注现在还有更好的DHAT。

jemalloc 可以被告知转储内存配置文件。你也许可以使用 Rust FFI 来做到这一点，但我还没有研究过这条路线。

Answer 2

就测量数据结构大小而言，可以通过使用特征和小型编译器插件相当轻松地完成。 Nicholas Nethercote 在他的文章测量数据结构大小：Firefox (C++) 与 Servo (Rust) 中演示了它在 Servo 中的工作原理；归根结底就是为您关心的每种类型添加

#[derive(HeapSizeOf)]

（或偶尔手动实现）。这也是精确检查内存去向的好方法。然而，它相对具有侵入性，因为它首先需要进行更改，而像 jemalloc 的

print_stats()

这样的东西则不需要。尽管如此，对于良好且精确的测量来说，这是一种合理的方法。

Answer 3

目前获取分配信息的唯一方法是

alloc::heap::stats_print();

方法（在

#![feature(alloc)]

后面），它调用jemalloc的

print_stats()

。

一旦我了解了输出的含义，我将用更多信息更新此答案。

（请注意，我不会接受这个答案，所以如果有人提出更好的解决方案......）

Answer 4

现在有

jemalloc_ctl

crate，它提供了方便的安全类型 API。将其添加到您的

Cargo.toml

：

[dependencies]
jemalloc-ctl = "0.3"
jemallocator = "0.3"

然后将

jemalloc

配置为全局分配器并使用

jemalloc_ctl::stats

模块中的方法：

这是官方示例：

use std::thread;
use std::time::Duration;
use jemalloc_ctl::{stats, epoch};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

fn main() {
    loop {
        // many statistics are cached and only updated when the epoch is advanced.
        epoch::advance().unwrap();

        let allocated = stats::allocated::read().unwrap();
        let resident = stats::resident::read().unwrap();
        println!("{} bytes allocated/{} bytes resident", allocated, resident);
        thread::sleep(Duration::from_secs(10));
    }
}

Answer 5

有人在这里整理了一个巧妙的小解决方案：https://github.com/discordance/trallocator/blob/master/src/lib.rs

use std::alloc::{GlobalAlloc, Layout};
use std::sync::atomic::{AtomicU64, Ordering};

pub struct Trallocator<A: GlobalAlloc>(pub A, AtomicU64);

unsafe impl<A: GlobalAlloc> GlobalAlloc for Trallocator<A> {
    unsafe fn alloc(&self, l: Layout) -> *mut u8 {
        self.1.fetch_add(l.size() as u64, Ordering::SeqCst);
        self.0.alloc(l)
    }
    unsafe fn dealloc(&self, ptr: *mut u8, l: Layout) {
        self.0.dealloc(ptr, l);
        self.1.fetch_sub(l.size() as u64, Ordering::SeqCst);
    }
}

impl<A: GlobalAlloc> Trallocator<A> {
    pub const fn new(a: A) -> Self {
        Trallocator(a, AtomicU64::new(0))
    }

    pub fn reset(&self) {
        self.1.store(0, Ordering::SeqCst);
    }
    pub fn get(&self) -> u64 {
        self.1.load(Ordering::SeqCst)
    }
}

用法：（来自：https://www.reddit.com/r/rust/comments/8z83wc/comment/e2h4dp9）

// needed for Trallocator struct (as written, anyway)
#![feature(integer_atomics, const_fn_trait_bound)]

use std::alloc::System;

#[global_allocator]
static GLOBAL: Trallocator<System> = Trallocator::new(System);

fn main() {
    GLOBAL.reset();
    println!("memory used: {} bytes", GLOBAL.get());
    {
        let mut vec = vec![1, 2, 3, 4];
        for i in 5..20 {
            vec.push(i);
            println!("memory used: {} bytes", GLOBAL.get());
        }
        for v in vec {
            println!("{}", v);
        }
    }
    // For some reason this does not print zero =/
    println!("memory used: {} bytes", GLOBAL.get());
}

我刚刚开始使用，感觉效果不错！直接、实时、不需要外部包，并且不需要更改基本内存分配器。

这也很好，因为它拦截了分配/解除分配调用，所以您应该能够根据需要添加自定义逻辑（例如，如果内存使用量超过 X，则打印堆栈跟踪以查看触发分配的内容）——尽管我还没试过这个。

我还没有测试过这种方法会增加多少开销。如果有人为此进行测试，请告诉我！

如何衡量函数的内存使用情况？

问题描述投票：0回答：5

5个回答

最新问题

如何衡量函数的内存使用情况？

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5