Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966838AbbBDPe6 (ORCPT ); Wed, 4 Feb 2015 10:34:58 -0500 Received: from mailout4.w1.samsung.com ([210.118.77.14]:27259 "EHLO mailout4.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966370AbbBDPWd (ORCPT ); Wed, 4 Feb 2015 10:22:33 -0500 MIME-version: 1.0 Content-type: multipart/mixed; boundary="Boundary_(ID_cjP6p7DPiBprQ99gyu2RZA)" X-AuditID: cbfec7f4-b7f126d000001e9a-6a-54d238a5a6be Message-id: <1423063348.24415.10.camel@AMDC1943> Subject: Re: [rcu] [ INFO: suspicious RCU usage. ] From: Krzysztof Kozlowski To: paulmck@linux.vnet.ibm.com Cc: Russell King - ARM Linux , Fengguang Wu , LKP , linux-kernel@vger.kernel.org, Bartlomiej Zolnierkiewicz , linux-arm-kernel@lists.infradead.org, Arnd Bergmann , MarkRutland Date: Wed, 04 Feb 2015 16:22:28 +0100 In-reply-to: <20150204151028.GD5370@linux.vnet.ibm.com> References: <20150201025922.GA16820@wfg-t540p.sh.intel.com> <1422957702.17540.1.camel@AMDC1943> <20150203162704.GR19109@linux.vnet.ibm.com> <1423049947.19547.6.camel@AMDC1943> <20150204130018.GG8656@n2100.arm.linux.org.uk> <20150204131420.GC5370@linux.vnet.ibm.com> <1423059387.24415.2.camel@AMDC1943> <20150204151028.GD5370@linux.vnet.ibm.com> X-Mailer: Evolution 3.10.4-0ubuntu2 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpnkeLIzCtJLcpLzFFi42I5/e/4Fd2lFpdCDC5Nk7D4O+kYu8XGGetZ Ld4/X89ssenxNVaLy7vmsFncvsxrsfJ4O6vF0usXmSzebv7O6sDpcX8vu8eaeWsYPVqae9g8 fv+axOixeM9LJo8HhzazeGxeUu/Rt2UVo8fnTXIBnFFcNimpOZllqUX6dglcGbsX9TIXTNKv eP1+G2sD42/VLkZODgkBE4kPXauZIGwxiQv31rN1MXJxCAksZZTofTaPBSTBKyAo8WPyPTCb WcBP4s/510wQRZ8ZJR4cv8kKUWQgMeX2Y3YQW1jASGLlu6lgU9kEjCU2L18CNJWDQ0RATmLN xCSQXmaBrUwSG67vYgOpYRFQlbh+cg1YL6eAucSFeVdZIBY8YZI4cOcfO0izhICyRGO/2wRG /llIbpqF5CYIW11i0rxFzLOAOpgF5CWOXMqGCKdL7P2yDS588LwsRDhOouvCa+YFjOyrGEVT S5MLipPScw31ihNzi0vz0vWS83M3MUIi7MsOxsXHrA4xCnAwKvHwNrRdDBFiTSwrrsw9xKgC NOfRhtUXGKVY8vLzUpVEeNeYXAoR4k1JrKxKLcqPLyrNSS0+xMjEwSnVwBjwba5Mmu61yRqc xnGaax4vNOJuqOlWlmuLyS3tO/3E+4X5gbmyi6deLJz5LH7WnZMHEkKdY68cnrfHgGfO++dn JP8Ft1fKvwxqir1vXGx0ebn5Lr7tR/XXbkm861/y8395scGbeBnFO48Wnjl62ZP7NN+Dla9b 8zr2ODOuq32w4303753icFUlluKMREMt5qLiRACfyOj1mgIAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6120 Lines: 167 --Boundary_(ID_cjP6p7DPiBprQ99gyu2RZA) Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 8BIT On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote: > On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote: > > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote: > > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote: > > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote: > > > > > +Cc some ARM people > > > > > > > > I wish that people would CC this list with problems seen on ARM. I'm > > > > minded to just ignore this message because of this in the hope that by > > > > doing so, people will learn something... > > > > > > > > > > Another thing I could do would be to have an arch-specific Kconfig > > > > > > variable that made ARM responsible for informing RCU that the CPU > > > > > > was departing, which would allow a call to as follows to be placed > > > > > > immediately after the complete(): > > > > > > > > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id()); > > > > > > > > > > > > Note: This absolutely requires that the rcu_cpu_notify() -always- > > > > > > be allowed to execute!!! This will not work if there is -any- possibility > > > > > > of __cpu_die() powering off the outgoing CPU before the call to > > > > > > rcu_cpu_notify() returns. > > > > > > > > Exactly, so that's not going to be possible. The completion at that > > > > point marks the point at which power _could_ be removed from the CPU > > > > going down. > > > > > > OK, sounds like a polling loop is required. > > > > I thought about using wait_on_bit() in __cpu_die() (the waiting thread) > > and clearing the bit on CPU being powered down. What do you think about > > such idea? > > Hmmm... It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(), > which in turn calls __wait_on_bit(), which calls prepare_to_wait() and > finish_wait(). These are in the scheduler, but this is being called from > the CPU that remains online, so that should be OK. > > But what do you invoke on the outgoing CPU? Can you get away with > simply clearing the bit, or do you also have to do a wakeup? It looks > to me like a wakeup is required, which would be illegal on the outgoing > CPU, which is at a point where it cannot legally invoke the scheduler. > Or am I missing something? Actually the timeout versions but I think that doesn't matter. The wait_on_bit will busy-loop with testing for the bit. Inside the loop it calls the 'action' which in my case will be bit_wait_io_timeout(). This calls schedule_timeout(). See proof of concept in attachment. One observed issue: hot unplug from commandline takes a lot more time. About 7 seconds instead of ~0.5. Probably I did something wrong. > > You know, this situation is giving me a bad case of nostalgia for the > old Sequent Symmetry and NUMA-Q hardware. On those platforms, the > outgoing CPU could turn itself off, and thus didn't need to tell some > other CPU when it was ready to be turned off. Seems to me that this > self-turn-off capability would be a great feature for future systems! There are a lot more issues with hotplug on ARM... Patch/RFC attached. --Boundary_(ID_cjP6p7DPiBprQ99gyu2RZA) Content-type: text/x-patch; CHARSET=US-ASCII; name=0001-ARM-Don-t-use-complete-during-__cpu_die.patch Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=0001-ARM-Don-t-use-complete-during-__cpu_die.patch >From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 4 Feb 2015 16:14:41 +0100 Subject: [RFC] ARM: Don't use complete() during __cpu_die The complete() should not be used on offlined CPU. Signed-off-by: Krzysztof Kozlowski --- arch/arm/kernel/smp.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 86ef244c5a24..f3a5ad80a253 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -76,6 +77,9 @@ enum ipi_msg_type { static DECLARE_COMPLETION(cpu_running); +#define CPU_DIE_WAIT_BIT 0 +static unsigned long wait_cpu_die; + static struct smp_operations smp_ops; void __init smp_set_ops(struct smp_operations *ops) @@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) pr_err("CPU%u: failed to boot: %d\n", cpu, ret); } - + set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die); memset(&secondary_data, 0, sizeof(secondary_data)); return ret; } @@ -213,7 +217,17 @@ int __cpu_disable(void) return 0; } -static DECLARE_COMPLETION(cpu_died); +static int wait_for_cpu_die(void) +{ + might_sleep(); + + if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die)) + return 0; + + return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT, + bit_wait_timeout, TASK_UNINTERRUPTIBLE, + msecs_to_jiffies(5000)); +} /* * called on the thread which is asking for a CPU to be shutdown - @@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died); */ void __cpu_die(unsigned int cpu) { - if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) { + if (wait_for_cpu_die()) { pr_err("CPU%u: cpu didn't die\n", cpu); return; } @@ -267,7 +281,7 @@ void __ref cpu_die(void) * this returns, power and/or clocks can be removed at any point * from this CPU and its cache by platform_cpu_kill(). */ - complete(&cpu_died); + clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die); /* * Ensure that the cache lines associated with that completion are -- 1.9.1 --Boundary_(ID_cjP6p7DPiBprQ99gyu2RZA)-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/