我们的程序的自动测试发现了 1 个模块的非常罕见的问题。测试包括使用我们的库在 valgrind (valgrind-3.18.1) 下运行示例程序。示例程序似乎正确完成,但后来 valgrind 开始以以下方式抱怨:
[FooTest] (... redacted program stdout ...)
[FooTest] ==2==
[FooTest] ==2== HEAP SUMMARY:
[FooTest] ==2== in use at exit: 473,112 bytes in 4,756 blocks
[FooTest] ==2== total heap usage: 27,292 allocs, 22,536 frees, 29,352,230 bytes allocated
[FooTest] ==2==
[FooTest]
[FooTest] Memcheck: mc_main.c:5765 (vgMemCheck_is_valid_aligned_word): Assertion 'VG_IS_WORD_ALIGNED(a)' failed.
[FooTest]
[FooTest] host stacktrace:
[FooTest] ==2== at 0x58042F3A: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x58043067: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x5804320B: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x58010FC1: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x580020C4: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x58002427: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x58002895: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x58002B42: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x5800437F: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x5800F2C2: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x580B2214: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest] ==2== by 0x580E4D53: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
[FooTest]
[FooTest] sched status:
[FooTest] running_tid=1
[FooTest]
[FooTest]
[FooTest] Note: see also the FAQ in the source distribution.
[FooTest] It contains workarounds to several common problems.
[FooTest] In particular, if Valgrind aborted or crashed after
[FooTest] identifying problems in your program, there's a good chance
[FooTest] that fixing those problems will prevent Valgrind aborting or
[FooTest] crashing, especially if it happened in m_mallocfree.c.
[FooTest]
[FooTest] If that doesn't help, please report this bug to: www.valgrind.org
[FooTest]
[FooTest] In the bug report, send all the above text, the valgrind
[FooTest] version, and what OS and version you are using. Thanks.
我完全无法在本地重现它,在构建服务器上它很少发生。使用 address 或 ub sanitizer 编译并运行示例程序没有显示任何问题。 检查代码没有发现任何问题。
这个错误是什么意思? valgrind 是否抱怨我的程序,或者 valgrind 崩溃了?无论如何,您是否有任何指示如何找到问题?
对齐问题通常发生在当你将指向某个东西的指针重新解释为更大的东西并且这个地址不均匀时。这是反序列化过程中最常见的错误。
例如,您从文件或网络中读取了一些数据,并且您有一个字节数组:
char buffer[256]
。
现在你通过执行 int
从中读取 reinterpret_cast
并且指针没有循环地址(不是 4 的乘法)。
演示:
char data[256];
std::cin.read(data, sizeof data);
auto count = std::cin.gcount();
std::cout << count << '\n';
if (count < 8) {
std::cerr << "to short input\n";
return 1;
}
std::cout << std::hex << *reinterpret_cast<int*>(data) << '\n';
std::cout << std::hex << *reinterpret_cast<int*>(data + 1) << '\n';
std::cout << std::hex << *reinterpret_cast<int*>(data + 2) << '\n';
std::cout << std::hex << *reinterpret_cast<int*>(data + 3) << '\n';
这会生成报告形式未定义的行为消毒剂:
/app/example.cpp:16:67: runtime error: load of misaligned address 0x7f3994d00021 for type 'int', which requires 4 byte alignment
0x7f3994d00021: note: pointer points here
7f 00 00 64 61 73 64 61 73 64 61 73 64 61 61 73 64 66 73 00 00 00 00 00 00 00 00 00 00 00 00 00
^
/app/example.cpp:17:67: runtime error: load of misaligned address 0x7f3994d00022 for type 'int', which requires 4 byte alignment
0x7f3994d00022: note: pointer points here
00 00 64 61 73 64 61 73 64 61 73 64 61 61 73 64 66 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
/app/example.cpp:18:67: runtime error: load of misaligned address 0x7f3994d00023 for type 'int', which requires 4 byte alignment
0x7f3994d00023: note: pointer points here
00 64 61 73 64 61 73 64 61 73 64 61 61 73 64 66 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
我很确定 valgrind 在这种情况下会报告你的错误。
如果您使用现代编译器,我真的建议您用消毒剂替换 valgrind。要使用它们,您只需在构建中添加适当的编译标志(请参阅我提供的 godbolt 链接)。