Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754026Ab3GJJwX (ORCPT ); Wed, 10 Jul 2013 05:52:23 -0400 Received: from merlin.infradead.org ([205.233.59.134]:44602 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753782Ab3GJJwV (ORCPT ); Wed, 10 Jul 2013 05:52:21 -0400 Date: Wed, 10 Jul 2013 11:52:10 +0200 From: Peter Zijlstra To: Sasha Levin Cc: Dave Jones , Tejun Heo , tglx@linutronix.de, LKML , trinity@vger.kernel.org, rostedt@goodmis.org Subject: Re: timer: lockup in run_timer_softirq() Message-ID: <20130710095210.GD17211@twins.programming.kicks-ass.net> References: <51DC902F.3070403@oracle.com> <20130709224706.GA13855@redhat.com> <51DC9379.9050408@oracle.com> <20130709230316.GB13855@redhat.com> <51DC983F.2010409@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51DC983F.2010409@oracle.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4180 Lines: 75 On Tue, Jul 09, 2013 at 07:09:51PM -0400, Sasha Levin wrote: > perf huh? I also have this spew I'm currently working on, it seems related: > > [ 1443.380407] ------------[ cut here ]------------ > [ 1443.381713] WARNING: CPU: 20 PID: 49263 at kernel/lockdep.c:3552 check_flags+0x16b/0x220() > [ 1443.383988] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) > [ 1443.385452] Modules linked in: > [ 1443.386459] CPU: 20 PID: 49263 Comm: trinity-child50 Tainted: G W > 3.10.0-next-20130709-sasha #3953 > [ 1443.388735] 0000000000000de0 ffff880805e03ab8 ffffffff84191329 ffffffff8519f386 > [ 1443.390082] ffff880805e03b08 ffff880805e03af8 ffffffff811279cc ffffffff8519f386 > [ 1443.390082] 0000000000000000 ffff8807cf898000 ffffffff85a66940 0000000000000082 > [ 1443.390082] Call Trace: > [ 1443.390082] [] dump_stack+0x52/0x87 > [ 1443.390082] [] warn_slowpath_common+0x8c/0xc0 > [ 1443.390082] [] warn_slowpath_fmt+0x46/0x50 > [ 1443.390082] [] ? sub_preempt_count+0x9/0xf0 > [ 1443.390082] [] check_flags+0x16b/0x220 > [ 1443.390082] [] lock_is_held+0x77/0x110 > [ 1443.390082] [] ? sub_preempt_count+0x4/0xf0 > [ 1443.390082] [] perf_tp_event+0xbe/0x450 > [ 1443.390082] [] ? perf_tp_event+0x422/0x450 > [ 1443.390082] [] ? sub_preempt_count+0x4/0xf0 > [ 1443.390082] [] perf_ftrace_function_call+0xc2/0xe0 > [ 1443.390082] [] ? ftrace_ops_control_func+0xc8/0x140 > [ 1443.427495] [] ? sub_preempt_count+0x4/0xf0 > [ 1443.427495] [] ? __local_bh_enable+0xc7/0xd0 > [ 1443.427495] [] ftrace_ops_control_func+0xc8/0x140 > [ 1443.427495] [] ftrace_call+0x5/0x2f > [ 1443.427495] [] ? ftrace_call+0x5/0x2f > [ 1443.427495] [] ? vtime_account_irq_exit+0x67/0x80 > [ 1443.427495] [] ? sub_preempt_count+0x9/0xf0 > [ 1443.427495] [] ? sub_preempt_count+0x9/0xf0 > [ 1443.427495] [] ? __local_bh_enable+0xe/0xd0 > [ 1443.427495] [] ? sub_preempt_count+0x9/0xf0 > [ 1443.427495] [] __local_bh_enable+0xc7/0xd0 > [ 1443.427495] [] __do_softirq+0x447/0x4d0 > [ 1443.427495] [] ? sub_preempt_count+0x9/0xf0 > [ 1443.427495] [] irq_exit+0x86/0x120 > [ 1443.427495] [] smp_apic_timer_interrupt+0x4a/0x60 > [ 1443.427495] [] apic_timer_interrupt+0x72/0x80 > [ 1443.427495] [] ? do_setitimer+0x242/0x2a0 > [ 1443.427495] [] ? _raw_spin_unlock_irq+0x4c/0x80 > [ 1443.427495] [] ? _raw_spin_unlock_irq+0x30/0x80 > [ 1443.427495] [] do_setitimer+0x242/0x2a0 > [ 1443.427495] [] alarm_setitimer+0x3a/0x70 > [ 1443.427495] [] SyS_alarm+0xe/0x20 > [ 1443.427495] [] tracesys+0xdd/0xe2 > [ 1443.427495] ---[ end trace e3b9a6b9c7462a51 ]--- Fun.. :-) we trace __local_bh_enable() and hit a ftrace callback between telling lockdep we enabled softirqs and actually doing so. I'm just a tad confused by the trace; it says we go: lock_is_held() check_flags() Looking at perf_tp_event() this would most likely be from: ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); Where the lock_is_held() would be from rcu_dereference_check()'s rcu_read_lock_sched_held(). However, by there we've already passed rcu_read_lock() which includes rcu_lock_acquire() -> lock_acquire() -> check_flags(). So it should've triggered there. Ideally we'd not trace __local_bh_enable() at all, seeing as how any RCU usage in there would be bound to trigger this. Steven? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/