Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756042AbaAJRCi (ORCPT ); Fri, 10 Jan 2014 12:02:38 -0500 Received: from merlin.infradead.org ([205.233.59.134]:36120 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751303AbaAJRCg (ORCPT ); Fri, 10 Jan 2014 12:02:36 -0500 Date: Fri, 10 Jan 2014 18:02:24 +0100 From: Peter Zijlstra To: Waiman Long Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Linux Kernel Mailing List , Aswin Chandramouleeswaran , Scott J Norton , Linus Torvalds Subject: Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel Message-ID: <20140110170223.GD8224@laptop.programming.kicks-ass.net> References: <52D011C9.7000209@hp.com> <20140110165822.GI7572@laptop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140110165822.GI7572@laptop.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote: > On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote: > > Peter, > > > > Call Trace: > > [] dump_stack+0x49/0x62 > > [] warn_slowpath_common+0x8c/0xc0 > > [] warn_slowpath_null+0x1a/0x20 > > [] force_sig_info+0x131/0x140 > > [] force_sig_info_fault+0x5f/0x70 > > [] ? search_exception_tables+0x2a/0x50 > > [] ? fixup_exception+0x1d/0x70 > > [] no_context+0x159/0x1f0 > > [] __bad_area_nosemaphore+0x12d/0x230 > > [] ? __bad_area_nosemaphore+0x12d/0x230 > > [] bad_area_nosemaphore+0x13/0x20 > > [] __do_page_fault+0x362/0x480 > > [] ? __do_page_fault+0x362/0x480 > > [] do_page_fault+0xe/0x10 > > [] page_fault+0x22/0x30 > > [] ? bad_to_user+0x5e/0x66b > > [] copy_from_user_nmi+0x76/0x90 > > [] perf_callchain_user+0xd0/0x360 > > [] perf_callchain+0x1af/0x1f0 > > [] perf_prepare_sample+0x2f3/0x3a0 > > [] __perf_event_overflow+0x10f/0x220 > > [] perf_event_overflow+0x14/0x20 > > [] intel_pmu_handle_irq+0x1de/0x3c0 > > [] ? emulate_vsyscall+0x144/0x390 > > [] perf_event_nmi_handler+0x34/0x60 > > [] nmi_handle+0x8a/0x170 > > [] default_do_nmi+0x68/0x210 > > [] do_nmi+0x90/0xe0 > > [] end_repeat_nmi+0x1e/0x2e > > [] ? emulate_vsyscall+0x144/0x390 > > [] ? emulate_vsyscall+0x144/0x390 > > [] ? emulate_vsyscall+0x144/0x390 > > <> [] __bad_area_nosemaphore+0x21d/0x230 > > [] bad_area_nosemaphore+0x13/0x20 > > [] __do_page_fault+0x362/0x480 > > [] ? vm_mmap_pgoff+0xbc/0xe0 > > [] do_page_fault+0xe/0x10 > > [] page_fault+0x22/0x30 > > ---[ end trace 037bf09d279751ec ]--- > > > > So this is a double page faults. Looking at relevant changes in > > 3.13 kernel, I spotted the following one patch that modified the > > perf_callchain_user() function shown up in the stack trace above: > > > > Hurm, that's an expected double fault, not something we should take the > process down for. > > I'll have to look at how all that works for a bit. How easily can you reproduce this? Could you test something like the below, which would allow us to take double faults from NMI context. --- arch/x86/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9ff85bb8dd69..18c498d4274d 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -641,7 +641,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, /* Are we prepared to handle this kernel fault? */ if (fixup_exception(regs)) { - if (current_thread_info()->sig_on_uaccess_error && signal) { + if (!in_nmi() && current_thread_info()->sig_on_uaccess_error && signal) { tsk->thread.trap_nr = X86_TRAP_PF; tsk->thread.error_code = error_code | PF_USER; tsk->thread.cr2 = address; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/