UPDATE-1:使示例更加真实
SUSE Tumbleweed、clang 19.1.4、gcc 14.2.1、valgrind 3.24.0(从源代码构建)
在 my_proc.cpp 中
int proc()
{
int y;
return y;
}
int main()
{
int y = proc();
int x;
return x+y;
}
使用 g++ 和 valgrind
g++ -g -O0 -fno-omit-frame-pointer my_prog.cpp
valgrind --tool=memcheck --leak-check=no --track-origins=yes ./a.out
只给我第 2 行,这是开始
int proc()
==469== Memcheck, a memory error detector
==469== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==469== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==469== Command: ./a.out
==469==
==469== Syscall param exit_group(status) contains uninitialised byte(s)
==469== at 0x4CC909D: _Exit (in /usr/lib64/libc.so.6)
==469== by 0x4C26EB5: __run_exit_handlers (in /usr/lib64/libc.so.6)
==469== by 0x4C26FFF: exit (in /usr/lib64/libc.so.6)
==469== by 0x4C0D2B4: (below main) (in /usr/lib64/libc.so.6)
==469== Uninitialised value was created by a stack allocation
==469== at 0x401116: proc() (my_prog.cpp:2)
==469==
==469==
==469== HEAP SUMMARY:
==469== in use at exit: 0 bytes in 0 blocks
==469== total heap usage: 1 allocs, 1 frees, 73,728 bytes allocated
==469==
==469== For a detailed leak analysis, rerun with: --leak-check=full
==469==
==469== For lists of detected and suppressed errors, rerun with: -s
==469== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
使用 clang++ 和内存清理程序作为比较(但我知道使用 msan 构建所有依赖项的巨大负担 - 这就是我使用 valgrind 的原因)
clang++ -g -O0 -fno-omit-frame-pointer -fsanitize=memory my_prog.cpp
给我第 4 行
return y;
和 9 int y = proc();
中第一个 uninitalize var 使用的确切位置,然后退出
==480==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x564ffb0f818a in proc() /home/linux/temp/my_prog.cpp:4:5
#1 0x564ffb0f820a in main /home/linux/temp/my_prog.cpp:9:13
#2 0x7fc2d66092ad in __libc_start_call_main (/lib64/libc.so.6+0x2a2ad) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7)
#3 0x7fc2d6609378 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a378) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7)
#4 0x564ffb05b104 in _start /home/abuild/rpmbuild/BUILD/glibc-2.40/csu/../sysdeps/x86_64/start.S:115
SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/linux/temp/my_prog.cpp:4:5 in proc()
Exiting
是否有一些技巧、编译器设置、dbg 技巧可以通过 valgrind 的输出准确定位这个不切实际的小示例中的变量? (考虑到我的真实场景更大更古老)
另一个场景:
小例子 - 但实际上,统一访问隐藏在超过 100k 行内联代码下 - 函数太大(不是我开发的),无法快速找到问题来自哪个 var 的线索
使用 UBSAN(以 MSAN 为例)和 -Werror -Wall 没有 UBSAN 警告,没有 ASAN 警告,只有 MSAN(非常详细)或 Valgrind,仅以 main 作为它发生的起点 - 但在我的真实场景中,整个函数中有更多的变量(作为示例)
我现在可以做什么来从 valgrind 信息中获取更多信息(附加选项、其他帮助 valgrind 的补充工具?) - 为未来做好更好的准备或扩展我的 CI 服务器以提供更好的信息
MSAN 超出了范围,因为使用了 20 个第三方库(甚至不是所有源都可用) - 使用 valgrind 的原因
我只对未初始化的内存发现感兴趣 - ASAN 已经使用多年来查找其他与内存相关的问题
union blub_t
{
short a;
int b;
};
int main()
{
blub_t g;
g.a = 10;
return g.b;
}
构建+valgrind
clang++ -g -O0 -fno-omit-frame-pointer -Werror -Wall my_prog.cpp
valgrind --tool=memcheck --leak-check=no --track-origins=yes ./a.out
给予
==636== Memcheck, a memory error detector
==636== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==636== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==636== Command: ./a.out
==636==
==636== Syscall param exit_group(status) contains uninitialised byte(s)
==636== at 0x4CC909D: _Exit (in /usr/lib64/libc.so.6)
==636== by 0x4C26EB5: __run_exit_handlers (in /usr/lib64/libc.so.6)
==636== by 0x4C26FFF: exit (in /usr/lib64/libc.so.6)
==636== by 0x4C0D2B4: (below main) (in /usr/lib64/libc.so.6)
==636== Uninitialised value was created by a stack allocation
==636== at 0x401116: main (my_prog.cpp:8)
==636==
==636==
==636== HEAP SUMMARY:
==636== in use at exit: 0 bytes in 0 blocks
==636== total heap usage: 1 allocs, 1 frees, 73,728 bytes allocated
==636==
==636== For a detailed leak analysis, rerun with: --leak-check=full
==636==
==636== For lists of detected and suppressed errors, rerun with: -s
==636== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
使用 MSAN 构建
clang++ -g -O0 -fno-omit-frame-pointer -fsanitize=memory,undefined -Werror -Wall my_prog.cpp
给予
==585==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x559d2f9471ec in main /home/linux/temp/my_prog.cpp:11:5
#1 0x7f47941bf2ad in __libc_start_call_main (/lib64/libc.so.6+0x2a2ad) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7)
#2 0x7f47941bf378 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a378) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7)
#3 0x559d2f8aa104 in _start /home/abuild/rpmbuild/BUILD/glibc-2.40/csu/../sysdeps/x86_64/start.S:115
SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/linux/temp/my_prog.cpp:11:5 in main
Exiting
如果我将代码更改为
#include "valgrind/memcheck.h"
int proc()
{
int easy_to_find;
(void)VALGRIND_CHECK_MEM_IS_DEFINED(&easy_to_find, sizeof(easy_to_find));
return easy_to_find;
}
int main()
{
int y = proc();
int x;
return x+y;
}
然后选择正确的选项
valgrind --track-origins=yes --read-var-info=yes ./so13
==395216== Memcheck, a memory error detector
==395216== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==395216== Using Valgrind-3.25.0.GIT and LibVEX; rerun with -h for copyright info
==395216== Command: ./so13
==395216==
==395216== Uninitialised byte(s) found during client check request
==395216== at 0x401378: proc() (so13.cpp:7)
==395216== by 0x401394: main (so13.cpp:13)
==395216== Location 0x1ffefff25c is 0 bytes inside local var "easy_to_find"
==395216== declared at so13.cpp:6, in frame #0 of thread 1
==395216== Uninitialised value was created by a stack allocation
==395216== at 0x401326: proc() (so13.cpp:5)
==395216==
==395216== Syscall param exit_group(status) contains uninitialised byte(s)
==395216== at 0x512F336: _Exit (in /usr/lib64/libc-2.28.so)
==395216== by 0x5077DC9: __run_exit_handlers (in /usr/lib64/libc-2.28.so)
==395216== by 0x5077DFF: exit (in /usr/lib64/libc-2.28.so)
==395216== by 0x50617EB: (below main) (in /usr/lib64/libc-2.28.so)
==395216== Uninitialised value was created by a stack allocation
==395216== at 0x401326: proc() (so13.cpp:5)
==395216==
==395216==
==395216== HEAP SUMMARY:
==395216== in use at exit: 0 bytes in 0 blocks
==395216== total heap usage: 1 allocs, 1 frees, 73,728 bytes allocated
==395216==
==395216== All heap blocks were freed -- no leaks are possible
==395216==
==395216== For lists of detected and suppressed errors, rerun with: -s
==395216== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
我认为问题是错误发生在proc()
返回之后
。 exe 的 dwarfdump 给了我
0x000002a2: DW_TAG_subprogram
DW_AT_external (true)
DW_AT_name ("proc")
DW_AT_decl_file ("/path/to/so13.cpp")
DW_AT_decl_line (4)
DW_AT_decl_column (0x05)
DW_AT_linkage_name ("_Z4procv")
DW_AT_type (0x0000029b "int")
DW_AT_low_pc (0x0000000000401326)
DW_AT_high_pc (0x0000000000401388)
DW_AT_frame_base (DW_OP_call_frame_cfa)
DW_AT_call_all_calls (true)
DW_AT_sibling (0x00000306)
0x000002c8: DW_TAG_variable
DW_AT_name ("easy_to_find")
DW_AT_decl_file ("/path/to/so13.cpp")
DW_AT_decl_line (6)
DW_AT_decl_column (0x09)
DW_AT_type (0x0000029b "int")
DW_AT_location (DW_OP_fbreg -20)
我并不是真正的 DWARF 专家,但我认为这意味着指令指针位于 DW_AT_low_pc 和 DW_AT_high_pc 之间,可以使用 DW_TAG_variable 从函数 DW_AT_frame_base 和变量位置计算出的地址获取文件和行信息DW_OP_fbreg -20。我错过了变量的词法范围。所有这些都表明我认为这是不可行的。在发生错误的地方,在 main 之后,我们不再拥有从 DWARF debuginfo 获取源文件信息所需的指令指针或帧指针的值。
我将查看
--track-origins=yes
保留的信息,看看是否可以扩展以获取更多信息。