Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933734AbcLSSJm (ORCPT ); Mon, 19 Dec 2016 13:09:42 -0500 Received: from caffeine.csclub.uwaterloo.ca ([129.97.134.17]:36678 "EHLO caffeine.csclub.uwaterloo.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932401AbcLSSJk (ORCPT ); Mon, 19 Dec 2016 13:09:40 -0500 Date: Mon, 19 Dec 2016 13:09:39 -0500 To: linux-kernel@vger.kernel.org Cc: Len Sorensen Subject: Debug hints for fpu state NULL pointer dereference on context switch during core dump in 3.0.101 Message-ID: <20161219180939.GA17367@csclub.uwaterloo.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) From: lsorense@csclub.uwaterloo.ca (Lennart Sorensen) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6022 Lines: 104 I am trying to debug a problem that has been happening occationally for years on some of our systems running 3.0.101 kernel (yes I know it is old, we are moving to 4.9 at the moment but I would like older releases to be fixed too, assuming 4.9 makes this problem disappear). What is happening is that once in a while a process does something wrong and segfaults, and dumps core. We have a handler to process the core dump to name it and compress it and make sure we don't keep to many around, so the core_pattern uses the pipe option to pipe the dump to a shell script that saves it with the pid and current timestamp and gzips it. Once in a while when this happens, the kernel hits a null pointer dereference in fpu.state->xsave while doing __switch_to. The system ix x86_64 with dual E5-2620 CPUs (6 cores each with hyperthreading). Some people think they have seen it on other systems, but are not sure. I have not been able to trigger it on other systems yet. It used to take about a week of running tests to trigger it, but I have now managed to hit it in a few minutes pretty reliably. The way I trigger it is: abuse.c: #include #include #include #include #include #include #include int main() { pid_t pid; pid = getpid(); printf("%u: pid / PI = %f\n", pid, pid / M_PI); kill(pid, SIGSEGV); return 0; } I then run: for i in `seq 1 1 10000`; do ./abuse & echo -n; done That pretty reliably hits the problem. The crash dump we get is: Dec 19 12:24:03 HWC-64 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000033f Dec 19 12:24:03 HWC-64 kernel: IP: [] __switch_to+0x4c/0x2b0 Dec 19 12:24:03 HWC-64 kernel: PGD 39a6ce067 PUD 39a8cf067 PMD 0 Dec 19 12:24:03 HWC-64 kernel: Oops: 0002 [#1] SMP Dec 19 12:24:03 HWC-64 kernel: CPU 12 Dec 19 12:24:03 HWC-64 kernel: Modules linked in: dpi_drv(P) ccu_util(P) ipv4_mb(P) l2bridge_config_util(P) l2_config_util(P) route_config_util(P) qos_config_util(P) sysapp_common(P) chantry_fwd_eng_2800_config(P) shim_module(P) sadb_cc(P) ipsecXformer(P) libeCrypto(P) ipmatch_cc(P) l2h_cc(P) ndproxy_cc(P) arpint_cc(P) portinfo_cc(P) chantryqos_cc(P) redirector_cc(P) ix_ph(P) fpm_core_cc(P) pulse_cc(P) vnstt_cc(P) vnsap_cc(P) fm_cc(P) rutm_cc(P) mutm_cc(P) ethernet_tx_cc(P) stkdrv_cc(P) l2bridge_cc(P) events_util(P) sched_cc(P) qm_cc(P) ipv4_cc(P) wred_cc(P) tc_meter_cc(P) dscp_classifier_cc(P) classifier_6t_cc(P) ent586_cc(P) dev_cc_arp(P) chantry_fwd_eng_2800_tables(P) ether_arp_lib(P) rtmv4_lib(P) lkup_lib(P) l2tm_lib(P) fragmentation_lib(P) properties_lib(P) msg_support_lib(P) utilities_lib(P) cci_lib(P) rm_lib(P) libossl vip productSpec_x86_dp(P) ixgbe igb Dec 19 12:24:03 HWC-64 kernel: Dec 19 12:24:03 HWC-64 kernel: Pid: 16440, comm: coremgr.sh Tainted: P 3.0.101 #6 Intel Corporation S2600GZ ........../S2600GZ Dec 19 12:24:03 HWC-64 kernel: RIP: 0010:[] [] __switch_to+0x4c/0x2b0 Dec 19 12:24:03 HWC-64 kernel: RSP: 0018:ffff88042f2c9de8 EFLAGS: 00010002 Dec 19 12:24:03 HWC-64 kernel: RAX: 00000000ffffffff RBX: ffff8803d8266ae0 RCX: 000000000000000c Dec 19 12:24:03 HWC-64 kernel: RDX: 00000000ffffffff RSI: ffff88042f29b840 RDI: 0000000000000000 Dec 19 12:24:03 HWC-64 kernel: RBP: ffff88043f38da40 R08: ffff8803d8266e08 R09: 0000000100018011 Dec 19 12:24:03 HWC-64 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000000 Dec 19 12:24:03 HWC-64 kernel: R13: ffff88042ce62080 R14: 000000000000000c R15: ffff88042f29b840 Dec 19 12:24:03 HWC-64 kernel: FS: 00007f72f71a5700(0000) GS:ffff88043f380000(0000) knlGS:0000000000000000 Dec 19 12:24:03 HWC-64 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 19 12:24:03 HWC-64 kernel: CR2: 000000000000033f CR3: 00000003d8c70000 CR4: 00000000000406e0 Dec 19 12:24:03 HWC-64 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 19 12:24:03 HWC-64 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 19 12:24:03 HWC-64 kernel: Process coremgr.sh (pid: 16440, threadinfo ffff88039a00c000, task ffff8803d8266ae0) Dec 19 12:24:03 HWC-64 kernel: Stack: Dec 19 12:24:03 HWC-64 kernel: 0000000000000018 ffff88043f38fe40 ffff88043f38fe40 ffff88042f29b840 Dec 19 12:24:03 HWC-64 kernel: 0000000000000000 ffffffff81301c94 ffff88042f29b840 0000000000000046 Dec 19 12:24:03 HWC-64 kernel: ffff88042f2c9fd8 ffff88042f2c9fd8 000000000000a000 0000000000000000 Dec 19 12:24:03 HWC-64 kernel: Call Trace: Dec 19 12:24:03 HWC-64 kernel: Code: 48 63 c1 48 03 14 c5 40 a0 67 81 8b 87 b8 03 00 00 48 89 d5 85 c0 74 37 66 66 90 66 90 b8 ff ff ff ff 48 8b bf c0 03 00 00 89 c2 Dec 19 12:24:03 HWC-64 kernel: <48> 0f ae 37 48 8b 83 c0 03 00 00 f6 80 00 02 00 00 01 0f 84 88 Dec 19 12:24:03 HWC-64 kernel: RIP [] __switch_to+0x4c/0x2b0 Dec 19 12:24:03 HWC-64 kernel: RSP Dec 19 12:24:03 HWC-64 kernel: CR2: 000000000000033f Dec 19 12:24:03 HWC-64 kernel: ---[ end trace 366732e1020fb678 ]--- That is calling fpu_xsave in fpu_save_init, since it dereferences fpu->state->xsave which is somehow NULL even though the process apparently is flagged as using the FPU. Of course the process was just created so I am wondering if there is (was?) a race condition in fork that could somehow be triggered when handling a core dump. coremgr.sh is the script that is handling the core dump. Sometimes it is the task listed, sometimes one of the things it calls is the process listed in the dump. It is always something to do with the core dump handling that is listed as the current process when this happens. Given I can't just insert printk's in the context switch code (comments clearly say not to do so), I am wondering what else I can try to get to the bottom of how this odd state could happen. Any suggestions where to poke next given at least I can now trigger it pretty fast? -- Len Sorensen