Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751311AbaJIUKP (ORCPT ); Thu, 9 Oct 2014 16:10:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:20468 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750853AbaJIUKM (ORCPT ); Thu, 9 Oct 2014 16:10:12 -0400 Date: Thu, 9 Oct 2014 16:09:55 -0400 From: Dave Jones To: Tejun Heo Cc: "Paul E. McKenney" , Linux Kernel Subject: Re: RCU stalls -> lockup. Message-ID: <20141009200955.GA15094@redhat.com> Mail-Followup-To: Dave Jones , Tejun Heo , "Paul E. McKenney" , Linux Kernel References: <20141002175515.GA28665@redhat.com> <20141002193655.GS5015@linux.vnet.ibm.com> <20141005021556.GC8549@htj.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141005021556.GC8549@htj.dyndns.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 04, 2014 at 10:15:56PM -0400, Tejun Heo wrote: > On Thu, Oct 02, 2014 at 12:36:55PM -0700, Paul E. McKenney wrote: > > On Thu, Oct 02, 2014 at 01:55:15PM -0400, Dave Jones wrote: > > > I just hit this on my box running 3.17rc7 > > > It was followed by a userspace lockup. (Could still ping, and sysrq > > > from the console, but even getty wasn't responding on the console). > > > > > > I was trying to reproduce another bug faster, and had ramped up the > > > number of processes trinity to uses to 512. This didn't take long > > > to fall out.. > > > > This might be related to an exchange I had with Tejun (CCed), where > > the work queues were running all out, preventing any quiescent states > > from happening. One fix under consideration is to add a quiescent state, > > similar to the one in softirq handling. > > Dave, can you please test whether the following patch makes a > difference if the problem is reproducible? > > http://lkml.kernel.org/r/20141003153701.7c7da030@jlaw-desktop.mno.stratus.com The only rcu related stuff I'm seeing now is the spew below, and unlike the issue before the above patch, it does seem to recover at least.. Dave INFO: rcu_preempt detected stalls on CPUs/tasks: Tasks blocked on level-0 rcu_node (CPUs 0-3): P5890 P6169 P6164 Tasks blocked on level-0 rcu_node (CPUs 0-3): P5890 P6169 P6164 (detected by 0, t=6502 jiffies, g=51433, c=51432, q=0) trinity-c393 R running task 12808 5890 5008 0x00000000 ffff880235b6bd08 0000000000000002 00000000001d8088 ffff88019c1ac680 00000000001d4080 0000000000000002 ffff880235b6bfd8 00000000001d4080 ffff880011e80000 ffff88019c1ac680 ffff880235b6bfd8 0000000000000000 Call Trace: [] ? perf_event_comm_output+0x1e0/0x1e0 [] preempt_schedule_irq+0x52/0xb0 [] retint_kernel+0x20/0x30 [] ? perf_event_mmap+0x24e/0x370 [] ? perf_event_aux+0xe4/0x380 [] ? perf_event_aux+0xff/0x380 [] ? perf_cpu_notify+0x50/0x50 [] perf_event_mmap+0x24e/0x370 [] do_brk+0x24d/0x350 [] SyS_brk+0x14e/0x170 [] tracesys+0xdd/0xe2 trinity-c375 R running task 14696 6169 5872 0x00000000 ffff8801cfd17e58 0000000000000002 ffff8801a7319780 ffff8801a7319780 00000000001d4080 0000000000000000 ffff8801cfd17fd8 00000000001d4080 ffff88008b49af00 ffff8801a7319780 ffff8801cfd17fd8 0000000000000000 Call Trace: [] preempt_schedule_irq+0x52/0xb0 [] retint_kernel+0x20/0x30 [] ? __task_pid_nr_ns+0x10d/0x1b0 [] ? rcu_is_watching+0x34/0x60 [] ? __task_pid_nr_ns+0x93/0x1b0 [] ? __task_pid_nr_ns+0x10d/0x1b0 [] ? __task_pid_nr_ns+0x5/0x1b0 [] schedule_tail+0x5e/0xb0 [] ret_from_fork+0xf/0xb0 trinity-c377 R running task 14632 6164 5874 0x00000000 ffff88000c397df8 0000000000000002 0000000000000002 ffff880066d59780 00000000001d4080 0000000000000000 ffff88000c397fd8 00000000001d4080 ffff88008b49af00 ffff880066d59780 ffff88000c397fd8 0000000000000000 Call Trace: [] preempt_schedule_irq+0x52/0xb0 [] retint_kernel+0x20/0x30 [] ? lock_acquire+0x9d/0x1b0 [] ? __task_pid_nr_ns+0x5/0x1b0 [] __task_pid_nr_ns+0x43/0x1b0 [] ? __task_pid_nr_ns+0x5/0x1b0 [] schedule_tail+0x5e/0xb0 [] ret_from_fork+0xf/0xb0 trinity-c393 R running task 12808 5890 5008 0x00000000 ffff880235b6bd08 0000000000000002 00000000001d8088 ffff88019c1ac680 00000000001d4080 0000000000000002 ffff880235b6bfd8 00000000001d4080 ffff880011e80000 ffff88019c1ac680 ffff880235b6bfd8 0000000000000000 Call Trace: [] ? perf_event_comm_output+0x1e0/0x1e0 [] preempt_schedule_irq+0x52/0xb0 [] retint_kernel+0x20/0x30 [] ? perf_event_mmap+0x24e/0x370 [] ? perf_event_aux+0xe4/0x380 [] ? perf_event_aux+0xff/0x380 [] ? perf_cpu_notify+0x50/0x50 [] perf_event_mmap+0x24e/0x370 [] do_brk+0x24d/0x350 [] SyS_brk+0x14e/0x170 [] tracesys+0xdd/0xe2 trinity-c375 R running task 14696 6169 5872 0x00000000 ffff8801cfd17e58 0000000000000002 ffff8801a7319780 ffff8801a7319780 00000000001d4080 0000000000000000 ffff8801cfd17fd8 00000000001d4080 ffff88008b49af00 ffff8801a7319780 ffff8801cfd17fd8 0000000000000000 Call Trace: [] preempt_schedule_irq+0x52/0xb0 [] retint_kernel+0x20/0x30 [] ? __task_pid_nr_ns+0x10d/0x1b0 [] ? rcu_is_watching+0x34/0x60 [] ? __task_pid_nr_ns+0x93/0x1b0 [] ? __task_pid_nr_ns+0x10d/0x1b0 [] ? __task_pid_nr_ns+0x5/0x1b0 [] schedule_tail+0x5e/0xb0 [] ret_from_fork+0xf/0xb0 trinity-c377 R running task 14632 6164 5874 0x00000000 ffff88000c397df8 0000000000000002 0000000000000002 ffff880066d59780 00000000001d4080 0000000000000000 ffff88000c397fd8 00000000001d4080 ffff88008b49af00 ffff880066d59780 ffff88000c397fd8 0000000000000000 Call Trace: [] preempt_schedule_irq+0x52/0xb0 [] retint_kernel+0x20/0x30 [] ? lock_acquire+0x9d/0x1b0 [] ? __task_pid_nr_ns+0x5/0x1b0 [] __task_pid_nr_ns+0x43/0x1b0 [] ? __task_pid_nr_ns+0x5/0x1b0 [] schedule_tail+0x5e/0xb0 [] ret_from_fork+0xf/0xb0 INFO: rcu_preempt detected stalls on CPUs/tasks: Tasks blocked on level-0 rcu_node (CPUs 0-3): Tasks blocked on level-0 rcu_node (CPUs 0-3): (detected by 0, t=26007 jiffies, g=51433, c=51432, q=0) INFO: Stall ended before state dump start INFO: rcu_preempt detected stalls on CPUs/tasks: Tasks blocked on level-0 rcu_node (CPUs 0-3): Tasks blocked on level-0 rcu_node (CPUs 0-3): (detected by 0, t=45512 jiffies, g=51433, c=51432, q=0) INFO: Stall ended before state dump start INFO: rcu_preempt detected stalls on CPUs/tasks: Tasks blocked on level-0 rcu_node (CPUs 0-3): Tasks blocked on level-0 rcu_node (CPUs 0-3): (detected by 0, t=65017 jiffies, g=51433, c=51432, q=0) INFO: Stall ended before state dump start -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/