Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932698AbaJWTmC (ORCPT ); Thu, 23 Oct 2014 15:42:02 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:43490 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932071AbaJWTmA (ORCPT ); Thu, 23 Oct 2014 15:42:00 -0400 Date: Thu, 23 Oct 2014 12:38:07 -0700 From: "Paul E. McKenney" To: Oleg Nesterov Cc: Dave Jones , Linux Kernel , htejun@gmail.com Subject: Re: rcu_preempt detected stalls. Message-ID: <20141023193807.GZ4977@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20141013173504.GA27955@redhat.com> <20141023183232.GW4977@linux.vnet.ibm.com> <20141023191319.GA5137@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141023191319.GA5137@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14102319-0033-0000-0000-000002714025 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 23, 2014 at 09:13:19PM +0200, Oleg Nesterov wrote: > On 10/23, Paul E. McKenney wrote: > > > > On Mon, Oct 13, 2014 at 01:35:04PM -0400, Dave Jones wrote: > > > Today in "rcu stall while fuzzing" news: > > > > > > INFO: rcu_preempt detected stalls on CPUs/tasks: > > > Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646 > > > Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646 > > > (detected by 0, t=6502 jiffies, g=75434, c=75433, q=0) > > > trinity-c342 R running task 13384 766 32295 0x00000000 > > > ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680 > > > 00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100 > > > ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000 > > > Call Trace: > > > [] preempt_schedule_irq+0x52/0xb0 > > > [] retint_kernel+0x20/0x30 > > > [] ? lock_acquire+0xd4/0x2b0 > > > [] ? kill_pid_info+0x5/0x130 > > > [] kill_pid_info+0x45/0x130 > > > [] ? kill_pid_info+0x5/0x130 > > > [] SYSC_kill+0xf2/0x2f0 > > > [] ? SYSC_kill+0x9b/0x2f0 > > > [] ? context_tracking_user_exit+0x57/0x280 > > > [] ? syscall_trace_enter+0x13d/0x310 > > > [] SyS_kill+0xe/0x10 > > > [] tracesys+0xdd/0xe2 > > > > Well, there is a loop in kill_pid_info(). I am surprised that it > > would loop indefinitely, but if it did, you would certainly get > > RCU CPU stalls. Please see patch below, adding Oleg for his thoughts. > > Yes, this loops should not be a problem, we only restart if we race with > a multi-threaded exec from a non-leader thread. > > But I already saw a couple of bug-reports which look as a task_struct > corruption (->signal/creds == NULL), looks like something was broken > recently. Perhaps an unbalanced put_task_struct... > > _Perhaps_ this is another case. If ->sighand was nullified then it will > loop forever. OK, so making each pass through the loop a separate RCU read-side critical section might be considered to be suppressing notification of an error condition? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/