我通过经典的配置、制作、安装来构建程序。几个月后,程序崩溃了。我仍然有源代码和未剥离的可执行文件所在的构建目录。从那里,我像这样调用 gdb:
530-north:courier$ gdb -q --core /tmp/core_epoch\=1667475742_pid\=23653_file\=\!usr\!local\!libexec\!courier\!courierd courierd
Reading symbols from courierd...
[New LWP 23653]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/libexec/courier/courierd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000561e841e5afd in msgq::completed(drvinfo&, unsigned long) ()
(gdb) info args
No symbol table info available.
使用
bt
,我可以看到两个函数之间的一长串调用:
#0 0x0000561e841e5afd in msgq::completed(drvinfo&, unsigned long) ()
#1 0x0000561e841e609a in msgq::startdelivery(drvinfo*, delinfo*) ()
#2 0x0000561e841e5bd8 in msgq::completed(drvinfo&, unsigned long) ()
#3 0x0000561e841e609a in msgq::startdelivery(drvinfo*, delinfo*) ()
#4 0x0000561e841e5bd8 in msgq::completed(drvinfo&, unsigned long) ()
...
#204 0x0000561e841e5a17 in msgq::completed(drvinfo&, unsigned long) ()
#205 0x0000561e841e609a in msgq::startdelivery(drvinfo*, delinfo*) ()
#206 0x0000561e841e5a17 in msgq::completed(drvinfo&, unsigned long) ()
#207 0x0000561e841e70fe in courierbmain() ()
#208 0x0000561e841dd030 in main ()
每两次调用都会将堆栈推进 0x110,总共约 27Kb,这远小于正在运行的进程分配的 132Kb 堆栈,因此这不是堆栈溢出。 SIGSEGV 可能来自空指针或其他任何东西。为什么gdb不指向它?这是 GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git,顺便说一句。
如果我省略 gdb 的最后一个参数,
bt
不会显示函数名称。我搞砸了编译吗?在 config.log 上,我看到我有 'CFLAGS= -march=nocona -O2 -g' 'LDFLAGS= -march=nocona -O2' 'CXXFLAGS= -march=nocona -O2 -std=c++11'
。源文件是C++。也许我错过了一些-g
?然而,一些符号在那里......
为什么gdb不指向它?
因为你还没有使用适当的调试信息编译你的程序。
您必须在程序集级别调试此崩溃。从
disasemble $pc
和 info registers
开始。
源文件是C++。也许我错过了一些-gs?
是的:您的
CXXFLAGS
没有 -g
。
但是,还是有一些符号...
在 UNIX 系统(与 Windows 不同)上,即使没有
-g
,函数名称(符号)也会存在(默认情况下)。这里并不矛盾。
更新:
但是,如果我不传递非剥离文件作为参数,则不会显示函数名称。
是:
strip
删除符号和调试信息。
您可以通过一个简单的测试来观察这一点:
// t.cc
#include <cstdlib>
struct S {
void fn() { abort(); }
};
int main()
{
S().fn();
}
首先让我们看看当二进制文件正确构建用于调试时它是如何工作的:
g++ -g t.cc -o a.out && strip ./a.out -o a.out.stripped &&
./a.out.stripped; gdb -q --batch -ex where ./a.out core
Aborted (core dumped)
...
warning: core file may not match specified executable file.
[New LWP 476070]
Core was generated by `./a.out.stripped'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007f12444895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
#2 __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
#3 0x00007f12445f5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f1244428469 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000055de28a24165 in S::fn (this=0x7ffcd0d1d80f) at t.cc:4
#6 0x000055de28a2414d in main () at t.cc:9
注意文件/行信息和函数名称的存在。如果我们使用精简版本,则两者都不存在:
ore was generated by `./a.out.stripped'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007f12444895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
#2 __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
#3 0x00007f12445f5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f1244428469 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000055de28a24165 in ?? ()
#6 0x000055de28a2414d in ?? ()
#7 0x00007f124442920a in __libc_start_call_main (main=main@entry=0x55de28a24139, argc=argc@entry=1, argv=argv@entry=0x7ffcd0d1d928) at ../sysdeps/nptl/libc_start_call_main.h:58
#8 0x00007f12444292bc in __libc_start_main_impl (main=0x55de28a24139, argc=1, argv=0x7ffcd0d1d928, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffcd0d1d918) at ../csu/libc-start.c:389
#9 0x000055de28a24071 in ?? ()
现在让我们重复一下错误构建的二进制文件(这就是你所拥有的):
g++ t.cc -o b.out && strip ./b.out -o b.out.stripped &&
./b.out.stripped; gdb -q --batch -ex where ./b.out core
Aborted (core dumped)
...
warning: core file may not match specified executable file.
[New LWP 478614]
Core was generated by `./b.out.stripped'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007f21a0a895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
#2 __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
#3 0x00007f21a0bf5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f21a0a28469 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000056049b052165 in S::fn() ()
#6 0x000056049b05214d in main ()
注意函数名称的存在(
S::fn()
、main
),但缺少文件/行/参数信息。这与您观察到的结果相符。
如果您使用
b.out.stripped
再次尝试,您将得到与上次使用 a.out.stripped
运行相同的结果:
Core was generated by `./b.out.stripped'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007f21a0a895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
#2 __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
#3 0x00007f21a0bf5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f21a0a28469 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000056049b052165 in ?? ()
#6 0x000056049b05214d in ?? ()
#7 0x00007f21a0a2920a in __libc_start_call_main (main=main@entry=0x56049b052139, argc=argc@entry=1, argv=argv@entry=0x7fff3554bc78) at ../sysdeps/nptl/libc_start_call_main.h:58
#8 0x00007f21a0a292bc in __libc_start_main_impl (main=0x56049b052139, argc=1, argv=0x7fff3554bc78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff3554bc68) at ../csu/libc-start.c:389
#9 0x000056049b052071 in ?? ()
此外,readelf --debug-dump=info courierd 显示了很多版本 4 的内容。
是的,如果你运行
readelf --debug-dump b.out
,你可以观察到很多来自crt0.o
、crtbegin.o
等的DWARF4内容(取决于你的GCC和GLIBC是如何构建的)。
如果您链接了
.c
文件,这些文件也将包含 DWARF4 调试信息,因为您的 CFLAGS
do 包含 -g
。
但是任何 DWARF4 的东西都不会来自定义
msgq::completed
的地方。