Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2793359ybi; Thu, 18 Jul 2019 14:35:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqwkiVvaW19ojYDzxSHzAHMGDfGTPaXLSfGDqv5TmozNZgYMmBXsrR0xoaP7SvLSKL83SO/R X-Received: by 2002:a17:90a:2525:: with SMTP id j34mr55068980pje.11.1563485701213; Thu, 18 Jul 2019 14:35:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563485701; cv=none; d=google.com; s=arc-20160816; b=MC5Gh1aZ6iFbIwtMeitjVl4gdomFiwS+FeIYScEulxwGmN5k8i/wxeHv6C8fD/vG0u 0EpGcShAGvfyGq19hnTD/LmVSCmTGF5tkAFZUAPWtBbuLasCFklNtixQIJ4+V2GpaAAs iF788z5tsN/fx9XeGP4m6i+/scIIwplWf9wsceWTi/moEKEPWDO2Ax5C6SasUdXO3+OA eZlSM7KHyzdLPO9pwiR7HpBktik4DZHpxVdCqNVy5FPamdvQpZZliPlgwEA2jS8Dymyc P8hVbYhmTV5wHSDZ/U2GZ54uDatMAa2ICUeUKTG5pvL9TM89CpqjFjoqIPDmA8mDGYvr sJVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=e8yo8LPzhFR6YbIUHnYiToiYykekjxt0VEMOuDocJNw=; b=kZOHFWpPuvwZzLos1LNqHyONvr1VrWaHa5LSN8XsJJHDQsWLkGLuLbJmJwCqFXX+xT iWZny7RWVxAYlTVNt5cSMf9/0wZxbb87JTZZ+Ev/oeVe2HP8eQPyG233pQyekgJJFqzt h1+LFQWfklbMcdDJKnSRCNIDqzBc8KCNo3Ama1MU4bd3BA9LWFxqCyYE0MHa8cMMI9DG betdD2IANirCagx6Wi4XNCqTog/EAA8U8s6KJOpjh0/L/ZOJVK7UYaMvDq087Buopa2V bl6uWYdzRI4uYLdSi6VIsZfG5Q4OOnaWKOqe3FHoXW1CFGRlV6tET7oebe0wJooqMB5r YBog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bc7si740600plb.55.2019.07.18.14.34.45; Thu, 18 Jul 2019 14:35:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728040AbfGRVe1 (ORCPT + 99 others); Thu, 18 Jul 2019 17:34:27 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:30328 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727767AbfGRVe0 (ORCPT ); Thu, 18 Jul 2019 17:34:26 -0400 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6ILVwCO105300 for ; Thu, 18 Jul 2019 17:34:24 -0400 Received: from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tu0s302cp-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 18 Jul 2019 17:34:24 -0400 Received: from localhost by e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 18 Jul 2019 22:34:23 +0100 Received: from b01cxnp23034.gho.pok.ibm.com (9.57.198.29) by e14.ny.us.ibm.com (146.89.104.201) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 18 Jul 2019 22:34:21 +0100 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6ILYKks38011172 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 18 Jul 2019 21:34:20 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0F98BB205F; Thu, 18 Jul 2019 21:34:20 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CC7C8B2066; Thu, 18 Jul 2019 21:34:19 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.80.219.160]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 18 Jul 2019 21:34:19 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id AA0E916C0E4B; Thu, 18 Jul 2019 14:34:19 -0700 (PDT) Date: Thu, 18 Jul 2019 14:34:19 -0700 From: "Paul E. McKenney" To: Joel Fernandes Cc: Byungchul Park , Byungchul Park , rcu , LKML , kernel-team@lge.com Subject: Re: [PATCH] rcu: Make jiffies_till_sched_qs writable Reply-To: paulmck@linux.ibm.com References: <20190711164818.GA260447@google.com> <20190711195839.GA163275@google.com> <20190712063240.GD7702@X58A-UD3R> <20190712125116.GB92297@google.com> <20190713151330.GE26519@linux.ibm.com> <20190713154257.GE133650@google.com> <20190713174111.GG26519@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 19071821-0052-0000-0000-000003E2EA26 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011454; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000287; SDB=6.01234098; UDB=6.00650323; IPR=6.01015426; MB=3.00027787; MTD=3.00000008; XFM=3.00000015; UTC=2019-07-18 21:34:22 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19071821-0053-0000-0000-000061C09707 Message-Id: <20190718213419.GV14271@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-07-18_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907180218 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 18, 2019 at 12:14:22PM -0400, Joel Fernandes wrote: > Trimming the list a bit to keep my noise level low, > > On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney wrote: > [snip] > > > It still feels like you guys are hyperfocusing on this one particular > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > Thanks for the hints, we'll do that. > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > What information will you need? How can you best get that information > > > > without excessive numbers of over-and-back interactions with the guy > > > > reporting the bug? As part of this last question, what information is > > > > normally supplied with the bug? Alternatively, what information are > > > > bug reporters normally expected to provide when asked? > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > there were RCU issues but if there's any fires you are currently fighting do > > > send it our way as debugging homework ;-) > > > > Suppose that you were getting RCU CPU stall > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > Of course, this really means that some other CPU/task is holding up > > multi_cpu_stop() without also blocking the current grace period. > > > > So I took a shot at this trying to learn how CPU stoppers work in > relation to this problem. > > I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state > in multi_cpu_stop() but another CPU Y has not yet entered this state. > So CPU X is stalling RCU but it is really because of CPU Y. Now in the > problem statement, you mentioned CPU Y is not holding up the grace > period, which means Y doesn't have any of IRQ, BH or preemption > disabled ; but is still somehow stalling RCU indirectly by troubling > X. > > This can only happen if : > - CPU Y has a thread executing on it that is higher priority than CPU > X's stopper thread which prevents it from getting scheduled. - but the > CPU stopper thread (migration/..) is highest priority RT so this would > be some kind of an odd scheduler bug. > - There is a bug in the CPU stopper machinery itself preventing it > from scheduling the stopper on Y. Even though Y is not holding up the > grace period. - CPU Y might have already passed through its quiescent state for the current grace period, then disabled IRQs indefinitely. Now, CPU Y would block a later grace period, but CPU X is preventing the current grace period from ending, so no such later grace period can start. > Did I get that right? Would be exciting to run the rcutorture test > once Paul has it available to reproduce this problem. Working on it! Slow, I know! Thanx, Paul