在我们的一个 Linux 机器上进行测试时,我们看到我们的用户空间程序停止运行,下面列出的调试输出被打印到 Linux 机器的串行端口。
我认为这个问题表明 Linux 内核有问题,内核模式驱动程序有问题,或者硬件有问题,但我想确认:由于 Linux 的 MMU/内存保护功能,它应该用户态程序不可能导致此故障,因此此故障的存在表明内核或硬件级别一定存在问题?
OTOH 如果用户模式程序有可能在健康的 Linux 机器上引发这种情况,我会对可能发生这种情况的任何已知机制的细节感兴趣。
[ 513.939323] redactedd[2619]: segfault at 48f0246f ip 00007f11adf0ea45 sp 00007fff238f05b8 error 6 in libc-2.7.so[7f11ade97000+14a000]
[ 513.965679] redactedd[2493]: segfault at 48f024f3 ip 00007f0b09ea2a45 sp 00007fff6d0537d8 error 6 in libc-2.7.so[7f0b09e2b000+14a000]
[ 513.997520] general protection fault: 0000 [#1] SMP
[ 514.001002] last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:04:00.0/port_control_monitor
[ 514.001002] CPU 2
[ 514.001002] Modules linked in: bonding adt7475 redacted_fpga ioatdmart e1000e [last unloaded: scsi_wait_scan]
[ 514.001002] Pid: 2629, comm: redactedd Not tainted 2.6.31.8 #1 Redacted 09:58:40]
[ 514.001002] RIP: 0010:[<ffffffff8137c33b>]
[ 514.062841] Outputs Process[2644]: segfault at 8359f44 ip 00000000008c58d1 sp 000000004011f110 error 6 in redactedd[400000+153c000]
[ 514.001002] [<ffffffff8137c33b>] tcp_v4_destroy_sock+0xdb/0x1c0
[ 514.001002] RSP: 0018:ffff88007d00bdb8 EFLAGS: 00010296
[ 514.001002] RAX: ffffffff815b0120 RBX: ffff88007c959a00 RCX: ffffffff81367104
[ 514.001002] RDX: 0000000000000000 RSI: ffff88007c959b78 RDI: ffff00007c959ee0
[ 514.001002] RBP: ffff88007c959ee0 R08: 0000000000000000 R09: 0000000000000001
[ 514.140837] redactedd[2667]: segfault at 60a76760 ip 00007f5660046c72 sp 00007fff82d80ac8 error 6 in libpthread-2.7.so[7f566003d000+16000]
[ 514.140860] R10: 0000000000000000 R11: 0000000000000014 R12: ffff88004d136600
[ 514.140860] R13: 0000000000000000 R14: ffff88007c959fa0 R15: 00007f886c05ea30
[ 514.140860] FS: 00007f8879e0f760(0000) GS:ffff880001731000(0000) knlGS:0000000000000000
[ 514.140860] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 514.140860] CR2: 0000000048f024f3 CR3: 000000007d01c000 CR4: 00000000000006e0
[ 514.140860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 514.140860] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 514.140860] Process redactedd (pid: 2629, threadinfo ffff88007d00a000, task ffff88007e0e8000)
[ 514.140860] Stack:
[ 514.140860] ffff88007c959a00 ffff88007c959a00 ffff88004d136600 ffffffff813d1fe9
[ 514.140860] <0> ffff88007c959a00 ffffffff8136831a ffff88004d1144b0 ffffffff81375f3a
[ 514.140860] <0> ffff88004d136600 ffff88007c959a00 ffff88004d1144b0 0000000000000000
[ 514.140860] Call Trace:
[ 514.140860] [<ffffffff813d1fe9>] ? tcp_v6_destroy_sock+0x9/0x20
[ 514.140860] [<ffffffff8136831a>] ? inet_csk_destroy_sock+0x4a/0x130
[ 514.140860] [<ffffffff81375f3a>] ? tcp_rcv_state_process+0x88a/0xc40
[ 514.140860] [<ffffffff813d2c6f>] ? tcp_v6_do_rcv+0x11f/0x3d0
[ 514.140860] [<ffffffff8133b97b>] ? __alloc_skb+0x6b/0x170
[ 514.140860] [<ffffffff813359ab>] ? release_sock+0x4b/0xa0
[ 514.140860] [<ffffffff8136aa89>] ? tcp_close+0x169/0x470
[ 514.140860] [<ffffffff8138b2ae>] ? inet_release+0x3e/0x70
[ 514.140860] [<ffffffff813336c1>] ? sock_release+0x21/0x90
[ 514.140860] [<ffffffff81333742>] ? sock_close+0x12/0x30
[ 514.140860] [<ffffffff811001ad>] ? __fput+0xcd/0x1e0
[ 514.140860] [<ffffffff810fca9b>] ? filp_close+0x5b/0x90
[ 514.140860] [<ffffffff810fcb76>] ? sys_close+0xa6/0x100
[ 514.140860] [<ffffffff8100bdbf>] ? system_call_fastpath+0x16/0x1b
[ 514.140860] Code: 00 00 00 ff 0f 00 00 0f 8f eb 00 00 00 48 8d ab e0 04 00 00 48 8b bb e0 04 00 00 48 39 fd 74 36 48 85 ff 74 31 ff 8b f0 04 00 00 <48> 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00
[ 514.140860] RIP [<ffffffff8137c33b>] tcp_v4_destroy_sock+0xdb/0x1c0
[ 514.140860] RSP <ffff88007d00bdb8>