Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7141982imu; Wed, 14 Nov 2018 12:22:35 -0800 (PST) X-Google-Smtp-Source: AJdET5dyxhRK+069SmYqXAqkThnJ7P//OVmyqykCSB6wygiJyKBSVI9l9Jf4/WuTxrex7baOMU/H X-Received: by 2002:a17:902:e10f:: with SMTP id cc15-v6mr3425130plb.309.1542226955512; Wed, 14 Nov 2018 12:22:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542226955; cv=none; d=google.com; s=arc-20160816; b=uNxBOC4tQkUgC9M0J7jYzpfrwuMRf2WMJxVSPb0cmA/KQ4eDVMV3mM+UjrLXVJX024 avBmfeXEVLcEWt81ugsQB+xwrovEAcWnwcHrWgLFWJCDRJBUXDGUK+OuI36mOsN91TAq 2KO2BpkuIBkZjl4WM611TeSiVN3B18EZoFTi2IXkdNUfXFLGJSeQVaFo99L3FYhw3WOL RmIeVlkP0zJ1OHpP2d/sDSSr0e9AGVg2Xp+/t46l/g8b5IXuj3+VGhJbvtQGvRrNk0Qh MKTzE9rjMZKd7j7lo71coqZ5ZS5khXl++lSa0k2zVWcYtZQ4v0WkR3LoKBZO8YGxnwrY E0Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:reply-to:subject:cc:to:from:date; bh=eoKZAdPlDnOEgM0dFpehmHYFCx7xvp+/8k9fkuencdA=; b=Ck+kycHLp3CEmxSY1oKux0Q2aQ8uzts5dfLJ8amkejucoDQyZHY852rr76rl6RPt4c JyBHV9D0a4sDdBZ07ttCzzwDBOGj6wqAy+rHgoFhVQlnlkBSy3zhTTwO6EbluKkDHjeK QM4JRFWBk73iQ36JA+6lX5C9dGHz2n3RBKLfFWp1DGZW7AVIKFMlW/T24Vdl6GCW8Iqd +mtTMJi3xpGgYFJmLXpaNVyP0oszkx0aPsqv3Kg5kYuwvznZiv2QePsSQwiF6FCQe7Xq SDoT2jHzJmUE/VBukRjavz9zBGcOWsnLOrOXwHcCGbKZ23r0eK94WoMeVnARDfQ9zR/C kgHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q14si25177771pgg.433.2018.11.14.12.22.20; Wed, 14 Nov 2018 12:22:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732984AbeKOGZD (ORCPT + 99 others); Thu, 15 Nov 2018 01:25:03 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48206 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727118AbeKOGZD (ORCPT ); Thu, 15 Nov 2018 01:25:03 -0500 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wAEKJsaD096402 for ; Wed, 14 Nov 2018 15:20:21 -0500 Received: from e17.ny.us.ibm.com (e17.ny.us.ibm.com [129.33.205.207]) by mx0a-001b2d01.pphosted.com with ESMTP id 2nrq719qvb-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 14 Nov 2018 15:20:20 -0500 Received: from localhost by e17.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 Nov 2018 20:20:18 -0000 Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27) by e17.ny.us.ibm.com (146.89.104.204) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 14 Nov 2018 20:20:15 -0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wAEKKEBC28573714 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Nov 2018 20:20:14 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 86265B2067; Wed, 14 Nov 2018 20:20:14 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4FCA7B205F; Wed, 14 Nov 2018 20:20:14 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.80.221.2]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 14 Nov 2018 20:20:14 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id D973216C36C7; Wed, 14 Nov 2018 12:20:13 -0800 (PST) Date: Wed, 14 Nov 2018 12:20:13 -0800 From: "Paul E. McKenney" To: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Cc: linux-kernel@vger.kernel.org, Andi Kleen , "Rafael J. Wysocki" , Viresh Kumar , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" Subject: Re: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds") Reply-To: paulmck@linux.ibm.com References: <20181113135453.GW9144@intel.com> <20181113151037.GG4170@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181113151037.GG4170@linux.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18111420-0040-0000-0000-000004927D1C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010049; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01117462; UDB=6.00577119; IPR=6.00897571; MB=3.00024164; MTD=3.00000008; XFM=3.00000015; UTC=2018-11-14 20:20:17 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18111420-0041-0000-0000-0000089B8825 Message-Id: <20181114202013.GA27603@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-11-14_16:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1811140181 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 13, 2018 at 07:10:37AM -0800, Paul E. McKenney wrote: > On Tue, Nov 13, 2018 at 03:54:53PM +0200, Ville Syrj?l? wrote: > > Hi Paul, > > > > After 4.20-rc1 some of my 32bit UP machines no longer reboot/shutdown. > > I bisected this down to commit 45975c7d21a1 ("rcu: Define RCU-sched > > API in terms of RCU for Tree RCU PREEMPT builds"). > > > > I traced the hang into > > -> cpufreq_suspend() > > -> cpufreq_stop_governor() > > -> cpufreq_dbs_governor_stop() > > -> gov_clear_update_util() > > -> synchronize_sched() > > -> synchronize_rcu() > > > > Only PREEMPT=y is affected for obvious reasons, but that couldn't > > explain why the same UP kernel booted on an SMP machine worked fine. > > Eventually I realized that the difference between working and > > non-working machine was IOAPIC vs. PIC. With initcall_debug I saw > > that we mask everything in the PIC before cpufreq is shut down, > > and came up with the following fix: > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 7aa3dcad2175..f88bf3c77fc0 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void) > > return 0; > > } > > module_param(off, int, 0444); > > -core_initcall(cpufreq_core_init); > > +late_initcall(cpufreq_core_init); > > Thank you for testing this and tracking it down! > > I am glad that you have a fix, but I hope that we can arrive at a less > constraining one. > > > Here's the resulting change in inutcall_debug: > > pci 0000:00:00.1: shutdown > > hub 4-0:1.0: hub_ext_port_status failed (err = -110) > > agpgart-intel 0000:00:00.0: shutdown > > + PM: Calling cpufreq_suspend+0x0/0x100 > > PM: Calling mce_syscore_shutdown+0x0/0x10 > > PM: Calling i8259A_shutdown+0x0/0x10 > > - PM: Calling cpufreq_suspend+0x0/0x100 > > + reboot: Restarting system > > + reboot: machine restart > > > > I didn't really look into what other ramifications the cpufreq > > initcall change might have. cpufreq_global_kobject worries > > me a bit. Maybe that one has to remain in core_initcall() and > > we could just move the suspend to late_initcall()? Anyways, > > I figured I'd leave this for someone more familiar with the > > code to figure out ;) > > Let me guess... > > When the system suspends or shuts down, there comes a point after which > there is only a single CPU that is running with preemption and interrupts > are disabled. At this point, RCU must change the way that it works, and > the commit you bisected to would make the change more necessary. But if > I am guessing correctly, we have just been getting lucky in the past. > > It looks like RCU needs to create a struct syscore_ops with a shutdown > function and pass this to register_syscore_ops(). Maybe a suspend > function as well. And RCU needs to invoke register_syscore_ops() at > a time that causes RCU's shutdown function to be invoked in the right > order with respect to the other work in flight. The hope would be that > RCU's suspend function gets called just as the system transitions into > a mode where the scheduler is no longer active, give or take. > > Does this make sense, or am I confused? Well, it certainly does not make sense in that blocking is still legal at .shutdown() invocation time, which means that RCU cannot revert to its boot-time approach at that point. Looks like I need hooks in a bunch of arch-dependent functions. Which is certainly doable, but will take a bit more digging. Thanx, Paul