Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932580AbaJWTQj (ORCPT ); Thu, 23 Oct 2014 15:16:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:3159 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932122AbaJWTQi (ORCPT ); Thu, 23 Oct 2014 15:16:38 -0400 Date: Thu, 23 Oct 2014 21:13:19 +0200 From: Oleg Nesterov To: "Paul E. McKenney" Cc: Dave Jones , Linux Kernel , htejun@gmail.com Subject: Re: rcu_preempt detected stalls. Message-ID: <20141023191319.GA5137@redhat.com> References: <20141013173504.GA27955@redhat.com> <20141023183232.GW4977@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141023183232.GW4977@linux.vnet.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/23, Paul E. McKenney wrote: > > On Mon, Oct 13, 2014 at 01:35:04PM -0400, Dave Jones wrote: > > Today in "rcu stall while fuzzing" news: > > > > INFO: rcu_preempt detected stalls on CPUs/tasks: > > Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646 > > Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646 > > (detected by 0, t=6502 jiffies, g=75434, c=75433, q=0) > > trinity-c342 R running task 13384 766 32295 0x00000000 > > ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680 > > 00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100 > > ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000 > > Call Trace: > > [] preempt_schedule_irq+0x52/0xb0 > > [] retint_kernel+0x20/0x30 > > [] ? lock_acquire+0xd4/0x2b0 > > [] ? kill_pid_info+0x5/0x130 > > [] kill_pid_info+0x45/0x130 > > [] ? kill_pid_info+0x5/0x130 > > [] SYSC_kill+0xf2/0x2f0 > > [] ? SYSC_kill+0x9b/0x2f0 > > [] ? context_tracking_user_exit+0x57/0x280 > > [] ? syscall_trace_enter+0x13d/0x310 > > [] SyS_kill+0xe/0x10 > > [] tracesys+0xdd/0xe2 > > Well, there is a loop in kill_pid_info(). I am surprised that it > would loop indefinitely, but if it did, you would certainly get > RCU CPU stalls. Please see patch below, adding Oleg for his thoughts. Yes, this loops should not be a problem, we only restart if we race with a multi-threaded exec from a non-leader thread. But I already saw a couple of bug-reports which look as a task_struct corruption (->signal/creds == NULL), looks like something was broken recently. Perhaps an unbalanced put_task_struct... _Perhaps_ this is another case. If ->sighand was nullified then it will loop forever. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/